Reference Source
import NavigationMan from 'squidwarc/lib/crawler/navigationMan.js'
public class | source

NavigationMan

Extends:

EventEmitter → NavigationMan

Monitor navigation and request events for crawling a page.

Constructor Summary

Public Constructor
public

constructor(options: CrawlControl, parentEmitter: EventEmitter)

Member Summary

Private Members
private

The url of the page a crawler is visiting

private

Flag indicating if we are in a network tracking state of not

private

The id of the global crawler setTimeout timer

private

The number of inflight requests (requests with no response) that should exist before starting the inflightIdle timer

private

Amount of time, in milliseconds, that should elapse when there are only _idleInflight requests for network idle to be determined

private

The id of the setTimeout for the network-idle timer

private

The id of the navigation setTimeout timer

private

How long should we wait before for navigation to occur before emitting navigation-timedout event

private

_parentEmitter: EventEmitter

An optional EventEmitter that we should emit this emitters events to rather than via ourselves

private

Set of the HTTP requests ids, used for tracking network-idle

private

Maximum amount of time, in milliseconds, before generating a WARC and moving to the next URL

Method Summary

Public Methods
public

Indicate that the browser has navigated to the current URL

public

navigationError(err: Error | string)

Used to have the NavigationManger emit the 'navigation-error' event

public

Indicate that a request has finished

public

Indicate that a request was made

public

Start Timers For Navigation Monitoring

Private Methods
private

Clear all timers

private

_emitEvent(event: string, arg: *)

Emit an event

private

Called when the global time limit was hit

private

Called when the navigation time limit was hit

private

Called when the network idle has been determined

Public Constructors

public constructor(options: CrawlControl, parentEmitter: EventEmitter) source

Params:

NameTypeAttributeDescription
options CrawlControl
  • optional
  • default: {}
parentEmitter EventEmitter
  • optional

Private Members

private _curl: string source

The url of the page a crawler is visiting

private _doneTimers: boolean source

Flag indicating if we are in a network tracking state of not

private _globalWaitTimer: number source

The id of the global crawler setTimeout timer

private _idleInflight: number source

The number of inflight requests (requests with no response) that should exist before starting the inflightIdle timer

private _idleTime: number source

Amount of time, in milliseconds, that should elapse when there are only _idleInflight requests for network idle to be determined

private _idleTimer: number source

The id of the setTimeout for the network-idle timer

private _navTimeout: number source

The id of the navigation setTimeout timer

private _navTimeoutTime: number source

How long should we wait before for navigation to occur before emitting navigation-timedout event

private _parentEmitter: EventEmitter source

An optional EventEmitter that we should emit this emitters events to rather than via ourselves

private _requestIds: Set<string> source

Set of the HTTP requests ids, used for tracking network-idle

private _timeout: number source

Maximum amount of time, in milliseconds, before generating a WARC and moving to the next URL

Public Methods

public didNavigate() source

Indicate that the browser has navigated to the current URL

public navigationError(err: Error | string) source

Used to have the NavigationManger emit the 'navigation-error' event

Params:

NameTypeAttributeDescription
err Error | string

public reqFinished(info: Object) source

Indicate that a request has finished

Params:

NameTypeAttributeDescription
info Object

CDP Response object received by Network.responseReceived or Network.loadingFailed

See:

public reqStarted(info: Object) source

Indicate that a request was made

Params:

NameTypeAttributeDescription
info Object

CDP object received from Network.requestWillBeSent

See:

public startedNav(curl: string) source

Start Timers For Navigation Monitoring

Params:

NameTypeAttributeDescription
curl string

the URL browser is navigating to

Private Methods

private _clearTimers() source

Clear all timers

private _emitEvent(event: string, arg: *) source

Emit an event

Params:

NameTypeAttributeDescription
event string

The event name to be emitted

arg *
  • optional

The value to be emitted for the event

private _globalNetworkTimeout() source

Called when the global time limit was hit

private _navTimedOut() source

Called when the navigation time limit was hit

private _networkIdled() source

Called when the network idle has been determined