AutoWARCParser
Extends:
Parses a WARC file automatically detecting if it is gzipped.
Example:
const parser = new AutoWARCParser('<path-to-warcfile>')
parser.on('record', record => { console.log(record) })
parser.on('error', error => { console.error(error) })
parser.start()
const parser = new AutoWARCParser()
parser.on('record', record => { console.log(record) })
parser.on('error', error => { console.error(error) })
parser.parseWARC('<path-to-warcfile>')
// requires node >= 10
for await (const record of new AutoWARCParser('<path-to-warcfile>')) {
console.log(record)
}
Constructor Summary
Public Constructor | ||
public |
constructor(wp: string) Create a new AutoWARCParser |
Member Summary
Public Members | ||
public |
|
Private Members | ||
private |
|
|
private |
|
Method Summary
Public Methods | ||
public |
Alias for start except that you can supply the path to the WARC file to be parsed if one was not supplied via the constructor or to parse another WARC file. |
|
public |
Begin parsing the WARC file. |
Private Methods | ||
private |
_getStream(): ReadStream | Gunzip Returns a ReadStream for the WARC to be parsed. |
|
private |
_onEnd() Listener for a parsers done event |
|
private |
Listener for a parsers error event |
|
private |
_onRecord(record: WARCRecord) Listener for a parsers record event |
Public Constructors
Public Methods
public parseWARC(wp: string): boolean source
Alias for start except that you can supply the path to the WARC file to be parsed if one was not supplied via the constructor or to parse another WARC file. If the path to WARC file to be parsed was supplied via the constructor and you supply a different path to this method. It will override the one supplied via the constructor
Params:
Name | Type | Attribute | Description |
wp | string |
|
path to the WARC file to be parsed |
Throw:
if the path to the WARC file is null or undefined or another error occurred |
public start(): boolean source
Begin parsing the WARC file. Once the start method has been called the parser will begin emitting
Return:
boolean | if the parser has begun or is currently parsing a WARC file
|
Emit:
record |
emitted when the parser has parsed a full record, the argument supplied to the listener will be the parsed record |
done |
emitted when the WARC file has been completely parsed |
error |
emitted if an exception occurs, the argument supplied to the listener will be the error that occurred. |
Throw:
if the path to the WARC file is null or undefined or another error occurred |
Private Methods
private _getStream(): ReadStream | Gunzip source
Returns a ReadStream for the WARC to be parsed.
If the WARC file is gziped the returned value will the
results of ReadStream.pipe(zlib.createGunzip())
private _onError(error: Error) source
Listener for a parsers error event
Params:
Name | Type | Attribute | Description |
error | Error |
private _onRecord(record: WARCRecord) source
Listener for a parsers record event
Params:
Name | Type | Attribute | Description |
record | WARCRecord |