Module Web.Crawler
- Description
This module implements a generic web crawler.
Features:
Fully asynchronous operation (Several hundred simultaneous requests)
Supports the /robots.txt exclusion standard
Extensible
URI Queues
Allow/Deny rules
Configurable
Number of concurrent fetchers
Bits per second (bandwidth throttling)
Number of concurrent fetchers per host
Delay between fetches from the same host
Supports HTTP and HTTPS