FrontierHelper
Helper class providing utility functions for in memory frontier implementation Frontier
Static Method Summary
Static Public Methods | ||
public static |
crawlModeToSymbol(mode: string): Symbol Retrieve the crawl-mode symbol from a configs string |
|
public static |
normalizeSeeds(seeds: string, mode: string, depth: number): Seed | Seed[] Ensure the starting seed list is one the frontier can understand |
|
public static |
shouldAddToFrontier(url: Object, curURL: string, tracker: SeedTracker): boolean Determine if a URL should be added to the frontier |
|
public static |
shouldAddToFrontierDefault(url: Object, curURL: string, tracker: SeedTracker): boolean Should a discovered URL be added to the frontier using the default strategy, applies for page-only and page-all-links |
|
public static |
shouldAddToFrontierPSD(url: Object, curURL: string, tracker: SeedTracker): boolean Should a discovered URL be added to the frontier using the Page Same Domain strategy |
Static Public Methods
public static crawlModeToSymbol(mode: string): Symbol source
Retrieve the crawl-mode symbol from a configs string
Params:
Name | Type | Attribute | Description |
mode | string | The crawl mode |
public static normalizeSeeds(seeds: string, mode: string, depth: number): Seed | Seed[] source
Ensure the starting seed list is one the frontier can understand
public static shouldAddToFrontier(url: Object, curURL: string, tracker: SeedTracker): boolean source
Determine if a URL should be added to the frontier
Params:
Name | Type | Attribute | Description |
url | Object | A URL extracted for the currently visited page |
|
curURL | string | The URL of the currently visited page |
|
tracker | SeedTracker | The seed tracker associated with the very first page the chain of pages being visited originated from |
public static shouldAddToFrontierDefault(url: Object, curURL: string, tracker: SeedTracker): boolean source
Should a discovered URL be added to the frontier using the default strategy, applies for page-only and page-all-links
Params:
Name | Type | Attribute | Description |
url | Object | A URL extracted for the currently visited page |
|
curURL | string | The URL of the currently visited page |
|
tracker | SeedTracker | The seed tracker associated with the very first page the chain of pages being visited originated from |
public static shouldAddToFrontierPSD(url: Object, curURL: string, tracker: SeedTracker): boolean source
Should a discovered URL be added to the frontier using the Page Same Domain strategy
Params:
Name | Type | Attribute | Description |
url | Object | A URL extracted for the currently visited page |
|
curURL | string | The URL of the currently visited page |
|
tracker | SeedTracker | The seed tracker associated with the very first page the chain of pages being visited originated from |