Reference Source
import FrontierHelper from 'squidwarc/lib/frontier/helper.js'
public class | source

FrontierHelper

Helper class providing utility functions for in memory frontier implementation Frontier

Static Method Summary

Static Public Methods
public static

Retrieve the crawl-mode symbol from a configs string

public static

normalizeSeeds(seeds: string, mode: string, depth: number): Seed | Seed[]

Ensure the starting seed list is one the frontier can understand

public static

Determine if a URL should be added to the frontier

public static

Should a discovered URL be added to the frontier using the default strategy, applies for page-only and page-all-links

public static

Should a discovered URL be added to the frontier using the Page Same Domain strategy

Static Public Methods

public static crawlModeToSymbol(mode: string): Symbol source

Retrieve the crawl-mode symbol from a configs string

Params:

NameTypeAttributeDescription
mode string

The crawl mode

Return:

Symbol

The crawl modes internal symbol

public static normalizeSeeds(seeds: string, mode: string, depth: number): Seed | Seed[] source

Ensure the starting seed list is one the frontier can understand

Params:

NameTypeAttributeDescription
seeds string

The initial seeds for the crawl

mode string

The crawl mode for the crawl to be launched

depth number

The crawls depth

Return:

Seed | Seed[]

The normalized Seed(s)

public static shouldAddToFrontier(url: Object, curURL: string, tracker: SeedTracker): boolean source

Determine if a URL should be added to the frontier

Params:

NameTypeAttributeDescription
url Object

A URL extracted for the currently visited page

curURL string

The URL of the currently visited page

tracker SeedTracker

The seed tracker associated with the very first page the chain of pages being visited originated from

Return:

boolean

public static shouldAddToFrontierDefault(url: Object, curURL: string, tracker: SeedTracker): boolean source

Should a discovered URL be added to the frontier using the default strategy, applies for page-only and page-all-links

Params:

NameTypeAttributeDescription
url Object

A URL extracted for the currently visited page

curURL string

The URL of the currently visited page

tracker SeedTracker

The seed tracker associated with the very first page the chain of pages being visited originated from

Return:

boolean

public static shouldAddToFrontierPSD(url: Object, curURL: string, tracker: SeedTracker): boolean source

Should a discovered URL be added to the frontier using the Page Same Domain strategy

Params:

NameTypeAttributeDescription
url Object

A URL extracted for the currently visited page

curURL string

The URL of the currently visited page

tracker SeedTracker

The seed tracker associated with the very first page the chain of pages being visited originated from

Return:

boolean