References
| summary | ||
| public |
F runPromise(runnable: function(): Promise<any>|Promise<any>, thener: function(...args: any), catcher: function(...args: any)): void Runs a promise using the supplied thener and catcher functions |
|
config
| summary | ||
| public |
C Config Crawl config loader |
|
| public |
|
|
| public |
|
|
| public |
|
|
| public |
T UserScript: function(page: Page): Promise<void> |
|
| public |
|
|
| public |
|
|
crawler
| summary | ||
| public |
Crawler based on cyrus-and/chrome-remote-interface |
|
| public |
C Helper A helper class providing utility methods |
|
| public |
Monitor navigation and request events for crawling a page. |
|
| public |
Monitors the HTTP requests made by a page and emits the 'network-idle' event when it has been determined the network is idle Used by PuppeteerCrawler |
|
| public |
Crawler based on puppeteer |
|
| public |
|
|
| public |
E Page |
|
frontier
injectManager
| summary | ||
| public |
Manages the JavaScript that is injected into the page |
|
| public |
T OnLoadInject: {scriptSource: string} |
|
| public |
T OnNewDocumentInject: {source: string} |
|
injectManager/pageInjects
| summary | ||
| public |
Starts the collection of the outlinks. |
|
| public |
F initCollectLinks(): void Function that is injected into every frame of the page currently being crawled that will setup the outlink collection depending if the frame injected into is the top frame or a sub frame. |
|
| public |
Builds the WARC outlink metadata information and finds potential links to goto next from a page and build |
|
| public |
F noNaughtyJS() Function that disables the setting of window event handlers onbeforeunload and onunload and disables the usage of window.alert, window.confirm, and window.prompt. |
|
| public |
F scrollOnLoad() Function that is injected into every frame of the page being crawled that starts scrolling the page
once the |
|
| public |
F async scrollPage(): Promise<void> Function that scrolls the page/frame injected into a maximum of 20 times or until no more scroll can be done |
|
launcher
| summary | ||
| public |
Utility class for launching or connecting to a Chrome/Chromium instance |
|
| public |
Utility class that provides functionality for finding an suitable chrome executable |
|
| public |
F async launch(options: ChromeOptions): Promise<!Puppeteer.Browser> Launch and connect or connect to Chrome/Chromium |
|
| public |
E CRI |
|
runners
| summary | ||
| public |
F async chromeRunner(conf: CrawlConfig): Promise<void, Error> Launches a crawl using the supplied configuration file path |
|
| public |
F async puppeteerRunner(conf: CrawlConfig): Promise<void, Error> Launches a crawl using the supplied configuration file path |
|
utils
| summary | ||
| public |
Utility class for displaying colored text in console |
|
| public |
Class that initializes the warc naming function used when generating the warcs |
|
| public |
F isEmptyPlainObject(object: Object): boolean Test to see if a |
|
| public |
Promise wrapper around setTimeout |
|
| public |
F makeRunnable(runnable: function(...args: any): Promise): function(...args: any): void Composes the supplied function with runPromise. |
|
Reference
Source
