Reference Source

Installation

Requirements: Node.js

git clone https://github.com/N0taN3rd/Squidwarc.git

cd Squidwarc

./boostrap.sh

If Docker is more your thing, then Squidwarc provides docker and compose file for your use.

Usage

To launch a crawl you can execute either

Docker usage:

Modify volumes field of the included docker-compose file to a local directory and place the crawl configuration file in it.

Please set the warc sub-field output to ./warcs and then execute docker-compose up to launch the crawl!

For more information about the configuration file, please see the Crawl Configuration section of this manual.

Out Of The Box Crawls

Page Only

Preserve the only the page, no links are followed

Page Only option plus preserve all links found on the page that are on the same domain as the page

Page + Same Domain Link option plus all links from other domains

For more information about the configuring a crawl using one of these modes, please see the description for the mode field of a crawls config file.