🚀 Rocket Crawl

Maximum Coverage URL Crawler for Bug Bounty & Security Research

A powerful bash script that aggregates URLs from 20+ sources including web archives, API endpoints, crawlers, and intelligence platforms. Perfect for reconnaissance, bug bounty hunting, and security assessments.

✨ Features

20+ Data Sources: Wayback Machine, Common Crawl, URLScan.io, AlienVault OTX, VirusTotal, and many more
Multi-API Integration: Leverages paid and free APIs for comprehensive coverage
Intelligent Crawling: Combines passive and active reconnaissance techniques
Historical Data: Fetches complete archive history from Wayback Machine (all years)
Rate Limit Handling: Built-in retry logic and rate limit management
Progress Tracking: Real-time status updates with color-coded output
Bulk Processing: Process multiple domains from a file
Smart Filtering: Removes duplicates, static files, and social media noise

📋 Requirements

Essential Tools

sudo apt install jq curl

Optional Crawlers (Recommended)

# Waybackurls
go install github.com/tomnomnom/waybackurls@latest

# GAU (GetAllUrls)
go install github.com/lc/gau/v2/cmd/gau@latest

# Hakrawler
go install github.com/hakluke/hakrawler@latest

# Katana
go install github.com/projectdiscovery/katana/cmd/katana@latest

# GoSpider
go install github.com/jaeles-project/gospider@latest

# ParamSpider
git clone https://github.com/devanshbatham/ParamSpider
cd ParamSpider
pip install -r requirements.txt

🔑 API Keys Setup

The script supports multiple API providers. Edit the script header and replace the placeholder text with your actual API keys:

export VIRUSTOTAL_API_KEY="add your virustotal key"
export SECURITYTRAILS_API_KEY="add your security trails key"
export GITHUB_TOKEN="add your github token"
export CHAOS_API_KEY="add your chaos key"
export ALIENVAULT_API_KEY="add your alienvault key"
export URLSCAN_API_KEY="add your urlscan key"
export SHODAN_API_KEY="add your shodan key"
export CENSYS_API_ID="add your censys api id"
export CENSYS_API_SECRET="add your censys api secret"
export GOOGLE_API_KEY="add your google api key"
export GOOGLE_CSE_ID="add your google cse id"
export TRELLO_API_KEY="add your trello api key"
export TRELLO_TOKEN="add your trello token"
export INTELX_API_KEY="add your intelx key"

Where to Get API Keys

AlienVault OTX: https://otx.alienvault.com/ (Free)
URLScan.io: https://urlscan.io/about/api (Free tier available)
VirusTotal: https://www.virustotal.com/gui/join-us (Free)
SecurityTrails: https://securitytrails.com/corp/api (Free tier)
Shodan: https://account.shodan.io/ (Paid)
GitHub: https://github.com/settings/tokens (Free)
Censys: https://search.censys.io/account/api (Free tier)
Intelligence X: https://intelx.io/signup (Free tier)

🚀 Usage

Single Domain

./rocket-crawl.sh geturls example.com

Multiple Domains

Create a file with one domain per line:

# domains.txt
example.com
subdomain1.example.com
subdomain2.example.com

Then run:

./rocket-crawl.sh getsuburls domains.txt

📊 Output

The script generates:

urls_<domain>.txt - URLs for individual domains
all_urls.txt - Consolidated results for bulk processing

Sample Output

==========================================
[✓] CRAWL COMPLETE!
==========================================
Domain: example.com
Total URLs: 45,892
Time taken: 287s
Output file: urls_example.com.txt
==========================================

🔍 Data Sources

Web Archives

Wayback Machine (complete history)
Archive.today
Common Crawl (5 latest indexes)

API-Based Sources

AlienVault OTX (paginated)
URLScan.io (full pagination up to 100k results)
VirusTotal (URLs, subdomains, communicating files)
SecurityTrails (DNS + history)
Shodan (DNS + search)
Intelligence X
Certificate Transparency (crt.sh, Censys)

Active Crawlers

Waybackurls
GAU (GetAllUrls)
Hakrawler
Katana
GoSpider
ParamSpider
Waymore

Additional Sources

GitHub (code + gists)
DNS Dumpster
Robots.txt & Sitemaps
Trello public boards
Paste sites (Pastebin, etc.)

⚙️ Configuration

Timeout Settings

All HTTP requests have built-in timeouts:

Standard requests: 180 seconds
Archive requests: 300 seconds (due to large datasets)
Connection timeout: 30 seconds

Filtering

The script automatically filters out:

Static files (images, fonts, media)
Common social media platforms
Duplicate URLs
Non-target domain URLs

🛡️ Security Notes

⚠️ IMPORTANT: This script contains embedded API keys that should be removed before sharing publicly!

Before contributing or sharing:

Remove all API keys from the script header
Never commit API keys to version control
Use environment variables for sensitive data
Consider using a .env file for local development

Recommended Setup

# Create a .env file (add to .gitignore)
cat > .env << EOF
export ALIENVAULT_API_KEY="your_key"
export URLSCAN_API_KEY="your_key"
# ... other keys
EOF

# Load before running
source .env
./rocket-crawl.sh geturls example.com

🤝 Contributing

Contributions are welcome! Areas for improvement:

Additional data sources
Performance optimizations
Better error handling
Output format options (JSON, CSV)
Proxy support

🐛 Troubleshooting

"Missing required tools" error

Install jq and curl: sudo apt install jq curl

"Rate limited" messages

Wait for the cooldown period or reduce concurrent requests

No results from API sources

Verify your API keys are set correctly and have valid quotas

Slow performance

Reduce crawling depth in active crawlers
Disable non-essential data sources
Use multiple domains in parallel (separate terminal windows)

📬 Support

For issues, questions, or feature requests, please open an issue on GitHub.

Happy Hunting! 🎯

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
README.md		README.md
rocket-crawl.sh		rocket-crawl.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Rocket Crawl

✨ Features

📋 Requirements

Essential Tools

Optional Crawlers (Recommended)

🔑 API Keys Setup

Where to Get API Keys

🚀 Usage

Single Domain

Multiple Domains

📊 Output

Sample Output

🔍 Data Sources

Web Archives

API-Based Sources

Active Crawlers

Additional Sources

⚙️ Configuration

Timeout Settings

Filtering

🛡️ Security Notes

Recommended Setup

🤝 Contributing

🐛 Troubleshooting

"Missing required tools" error

"Rate limited" messages

No results from API sources

Slow performance

📬 Support

About

Uh oh!

Releases

Packages

Languages

MrRockettt/Rocket-Crawl

Folders and files

Latest commit

History

Repository files navigation

🚀 Rocket Crawl

✨ Features

📋 Requirements

Essential Tools

Optional Crawlers (Recommended)

🔑 API Keys Setup

Where to Get API Keys

🚀 Usage

Single Domain

Multiple Domains

📊 Output

Sample Output

🔍 Data Sources

Web Archives

API-Based Sources

Active Crawlers

Additional Sources

⚙️ Configuration

Timeout Settings

Filtering

🛡️ Security Notes

Recommended Setup

🤝 Contributing

🐛 Troubleshooting

"Missing required tools" error

"Rate limited" messages

No results from API sources

Slow performance

📬 Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages