Reference Source

References

summary
public

F runPromise(runnable: function(): Promise<any>|Promise<any>, thener: function(...args: any), catcher: function(...args: any)): void

Runs a promise using the supplied thener and catcher functions

config

summary
public

C Config

Crawl config loader

public
public
public
public

T UserScript: function(page: Page): Promise<void>

public
public

crawler

summary
public

Crawler based on cyrus-and/chrome-remote-interface

public

C Helper

A helper class providing utility methods

public

Monitor navigation and request events for crawling a page.

public

Monitors the HTTP requests made by a page and emits the 'network-idle' event when it has been determined the network is idle Used by PuppeteerCrawler

public

Crawler based on puppeteer

public
public

E Page

frontier

summary
public

Helper class providing utility functions for in memory frontier implementation Frontier

public

In memory implementation of a frontier

public

Tracks the progress of crawl per starting seed URL.

public

injectManager

summary
public

Manages the JavaScript that is injected into the page

public

T OnLoadInject: {scriptSource: string}

public

injectManager/pageInjects

summary
public

F collect(): Promise<{outlinks: string, links: Array<string>, location: string}>

Starts the collection of the outlinks.

public

F initCollectLinks(): void

Function that is injected into every frame of the page currently being crawled that will setup the outlink collection depending if the frame injected into is the top frame or a sub frame.

public

F async outLinks(): Promise<{outlinks: string, links: Array<string>}>

Builds the WARC outlink metadata information and finds potential links to goto next from a page and build

public

Function that disables the setting of window event handlers onbeforeunload and onunload and disables the usage of window.alert, window.confirm, and window.prompt.

public

Function that is injected into every frame of the page being crawled that starts scrolling the page once the load event has been fired a maximum of 20 times or until no more scroll can be done

public

F async scrollPage(): Promise<void>

Function that scrolls the page/frame injected into a maximum of 20 times or until no more scroll can be done

launcher

summary
public

Utility class for launching or connecting to a Chrome/Chromium instance

public

Utility class that provides functionality for finding an suitable chrome executable

public

F async launch(options: ChromeOptions): Promise<!Puppeteer.Browser>

Launch and connect or connect to Chrome/Chromium

public

E CRI

runners

summary
public

F async chromeRunner(conf: CrawlConfig): Promise<void, Error>

Launches a crawl using the supplied configuration file path

public

F async puppeteerRunner(conf: CrawlConfig): Promise<void, Error>

Launches a crawl using the supplied configuration file path

utils

summary
public

Utility class for displaying colored text in console

public

Class that initializes the warc naming function used when generating the warcs

public

Test to see if a plain object is empty

public

F delay(amount: number): Promise<void>

Promise wrapper around setTimeout

public

F makeRunnable(runnable: function(...args: any): Promise): function(...args: any): void

Composes the supplied function with runPromise.