Reference Source
import SeedTracker from 'squidwarc/lib/frontier/seedTracker.js'
public class | source

SeedTracker

Tracks the progress of crawl per starting seed URL. Because multiple seeds can be used, each potentially generating additional URLs to crawl, SeedTrackers consolidate this process per starting seed. Tracks the URLs discovered for a starting seed and allows for propagation of the crawl mode throughout the entirety of the crawl.

Constructor Summary

Public Constructor
public

constructor(url: string, mode: Symbol, depth: number)

Member Summary

Public Members
public

The depth of the crawl for this seed

public

The crawl mode symbol the seed is operating under

public

A set of URLs used for duplication of URLs generated by this seed during the crawl

public

The URL of the starting seed

public

How many URLs are left to crawl that originated from the starting seed

Method Summary

Public Methods
public

Adds a URL to the set of URLs seen and increments the seeds URL count

public

Decreases the number of URLs left to crawl for this seed

public

Are there no more URLs to be crawled that are associated with this seed

public

Have we seen the supplied URL

Public Constructors

public constructor(url: string, mode: Symbol, depth: number) source

Params:

NameTypeAttributeDescription
url string

A starting seed

mode Symbol

The mode for the seed

depth number

The crawl depth

Public Members

public depth: number source

The depth of the crawl for this seed

public mode: Symbol source

The crawl mode symbol the seed is operating under

public seen: Set<string> source

A set of URLs used for duplication of URLs generated by this seed during the crawl

public url: string source

The URL of the starting seed

public urlCount: number source

How many URLs are left to crawl that originated from the starting seed

Public Methods

public addToSeen(url: string) source

Adds a URL to the set of URLs seen and increments the seeds URL count

Params:

NameTypeAttributeDescription
url string

The URL to mark as seen

public crawledURL() source

Decreases the number of URLs left to crawl for this seed

public done(): boolean source

Are there no more URLs to be crawled that are associated with this seed

Return:

boolean

public seenURL(url: string): boolean source

Have we seen the supplied URL

Params:

NameTypeAttributeDescription
url string

The URL to check if we have seen it

Return:

boolean