CrowlerGo is a high-performance, concurrent web crawler written in Go. It is designed to efficiently map website structures, discover subpages, and export results in various formats.
- Concurrent Crawling: Utilizes Go routines for fast and efficient crawling.
- Configurable Concurrency: detailed control over the number of concurrent workers.
- Depth Control: Limit the depth of the crawl to avoid going too deep.
- Result Limiting: Stop after a specified number of results.
- Output Formats: Supports CSV (with discovery paths) and TXT formats.
- Resumable/Incremental: Can load previously visited URLs to avoid re-crawling or valid new discovery only.
- Path Tracking: Optionally tracks and visually represents the discovery path for each URL.
Run the crawler by providing a starting URL:
./crowlergo <url> [options]# Basic crawl
./crowlergo https://example.com
# Crawl with specific limits and output
./crowlergo https://example.com -depth 3 -limit 500 -output my_results.csv
# High concurrency crawl
./crowlergo https://example.com -concurrency 200| Flag | Type | Default | Description |
|---|---|---|---|
-output |
string | results.csv |
Path to the output file. |
-format |
string | csv |
Output format: txt or csv. |
-depth |
int | 5 |
Maximum crawling depth. |
-limit |
int | 1000 |
Maximum number of results to collect. |
-concurrency |
int | 100 |
Number of concurrent workers. |
-subpages |
bool | false |
Include all subpages (full URLs) in results, not just unique hosts. |
-input |
string | "" |
File containing already visited URLs to seed the crawler. |
-new-only |
bool | false |
If true, only saves newly discovered URLs (doesn't merge with input file). |
-no-path |
bool | false |
Disable discovery path tracking (saves memory/performance). |
To build the project for your current platform:
./build.sh