Reference Source
import Frontier from 'squidwarc/lib/frontier/index.js'
public class | source

Frontier

In memory implementation of a frontier

Constructor Summary

Public Constructor
public

Create a new frontier object

Member Summary

Public Members
public

current: {url: string, mode: Symbol, cdepth: number, tracker: string}

Information pertaining to the current URL being crawled

public

queue: {url: string, mode: Symbol, cdepth: number, tracker: string}}[]

URLs to be crawled

public

Tracks the depth and crawl config per starting seed

Method Summary

Public Methods
public

Is the frontier exhausted

public

init(starting: Seed[] | Seed)

Initialize the initial frontier

public

Get the next URL to crawl from the frontier, queue length - 1

public

process(links: Array<{href: string, pathname: string, host: string}>)

Process discovered outlinks of a page based on the originating seeds configuration

public

Returns the number of URLs left in the queue

Public Constructors

public constructor() source

Create a new frontier object

Public Members

public current: {url: string, mode: Symbol, cdepth: number, tracker: string} source

Information pertaining to the current URL being crawled

public queue: {url: string, mode: Symbol, cdepth: number, tracker: string}}[] source

URLs to be crawled

public trackers: Map<string, SeedTracker> source

Tracks the depth and crawl config per starting seed

Public Methods

public exhausted(): boolean source

Is the frontier exhausted

Return:

boolean

public init(starting: Seed[] | Seed) source

Initialize the initial frontier

Params:

NameTypeAttributeDescription
starting Seed[] | Seed

public next(): string source

Get the next URL to crawl from the frontier, queue length - 1

Return:

string (nullable: true)

public process(links: Array<{href: string, pathname: string, host: string}>) source

Process discovered outlinks of a page based on the originating seeds configuration

Params:

NameTypeAttributeDescription
links Array<{href: string, pathname: string, host: string}>

list of seeds to consider

public size(): number source

Returns the number of URLs left in the queue

Return:

number