Home Reference Source
import WARCGzParser from 'node-warc/lib/parsers/warcGzParser.js'
public class | source

WARCGzParser

Extends:

EventEmitter → WARCGzParser

Parse a WARC.gz file

Example:

 const parser = new WARCGzParser('<path-to-warcfile>')
 parser.on('record', record => { console.log(record); })
 parser.on('done', () => { console.log('finished'); })
 parser.on('error', error => { console.error(error); })
 parser.start()
 const parser = new WARCGzParser()
 parser.on('record', record => { console.log(record); })
 parser.on('done', () => { console.log('finished'); })
 parser.on('error', error => { console.error(error); })
 parser.parseWARC('<path-to-warcfile>')
 // requires node >= 10
 for await (const record of new WARCGzParser('<path-to-warcfile>')) {
   console.log(record)
 }

Constructor Summary

Public Constructor
public

Create a new WARCGzParser

Member Summary

Public Members
public
Private Members
private
private

Method Summary

Public Methods
public

Alias for start except that you can supply the path to the WARC.gz file to be parsed if one was not supplied via the constructor or to parse another WARC.gz file.

public

Begin parsing the WARC.gz file.

Private Methods
private

_onEnd()

Callback for the read stream end event

private

_onError(error: Error)

Callback for the read stream error event

private

Callback for the read stream data event

Public Constructors

public constructor(wp: string) source

Create a new WARCGzParser

Params:

NameTypeAttributeDescription
wp string
  • optional
  • nullable: true

path to the warc.gz file to be parsed

Public Members

public [Symbol.asyncIterator]: AsyncIterator<WARCRecord>: * source

Private Members

private _parsing: boolean source

private _wp: string source

Public Methods

public parseWARC(wp: string): boolean source

Alias for start except that you can supply the path to the WARC.gz file to be parsed if one was not supplied via the constructor or to parse another WARC.gz file. If the path to WARC.gz file to be parsed was supplied via the constructor and you supply a different path to this method. It will override the one supplied via the constructor

Params:

NameTypeAttributeDescription
wp string
  • optional
  • nullable: true

path to the WARC file to be parsed

Return:

boolean

indication if the parser has begun or is currently parsing a WARC.gz file

Throw:

Error

if the path to the WARC.gz file is null or undefined or another error occurred

public start(): boolean source

Begin parsing the WARC.gz file. Once the start method has been called the parser will begin emitting

Return:

boolean

indication if the parser has begun or is currently parsing a WARC.gz file

  • true: indicates the parser has begun parsing the WARC.gz file true
  • false: indicated the parser is currently parsing a WARC.gz file

Emit:

record

emitted when the parser has parsed a full record, the argument supplied to the listener will be the parsed record

done

emitted when the WARC.gz file has been completely parsed, the argument supplied to the listener will be last record

error

emitted if an exception occurs, the argument supplied to the listener will be the error that occurred.

Throw:

Error

if the path to the WARC.gz file is null or undefined or another error occurred

Private Methods

private _onEnd() source

Callback for the read stream end event

private _onError(error: Error) source

Callback for the read stream error event

Params:

NameTypeAttributeDescription
error Error

private _onRecord(record: WARCRecord) source

Callback for the read stream data event

Params:

NameTypeAttributeDescription
record WARCRecord