Skip to content

A powerful bash script that aggregates URLs from 20+ sources including web archives, API endpoints, crawlers, and intelligence platforms. Perfect for reconnaissance, bug bounty hunting, and security assessments.

Notifications You must be signed in to change notification settings

MrRockettt/Rocket-Crawl

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Rocket Crawl

Maximum Coverage URL Crawler for Bug Bounty & Security Research

A powerful bash script that aggregates URLs from 20+ sources including web archives, API endpoints, crawlers, and intelligence platforms. Perfect for reconnaissance, bug bounty hunting, and security assessments.

✨ Features

  • 20+ Data Sources: Wayback Machine, Common Crawl, URLScan.io, AlienVault OTX, VirusTotal, and many more
  • Multi-API Integration: Leverages paid and free APIs for comprehensive coverage
  • Intelligent Crawling: Combines passive and active reconnaissance techniques
  • Historical Data: Fetches complete archive history from Wayback Machine (all years)
  • Rate Limit Handling: Built-in retry logic and rate limit management
  • Progress Tracking: Real-time status updates with color-coded output
  • Bulk Processing: Process multiple domains from a file
  • Smart Filtering: Removes duplicates, static files, and social media noise

πŸ“‹ Requirements

Essential Tools

sudo apt install jq curl

Optional Crawlers (Recommended)

# Waybackurls
go install github.com/tomnomnom/waybackurls@latest

# GAU (GetAllUrls)
go install github.com/lc/gau/v2/cmd/gau@latest

# Hakrawler
go install github.com/hakluke/hakrawler@latest

# Katana
go install github.com/projectdiscovery/katana/cmd/katana@latest

# GoSpider
go install github.com/jaeles-project/gospider@latest

# ParamSpider
git clone https://github.com/devanshbatham/ParamSpider
cd ParamSpider
pip install -r requirements.txt

πŸ”‘ API Keys Setup

The script supports multiple API providers. Edit the script header and replace the placeholder text with your actual API keys:

export VIRUSTOTAL_API_KEY="add your virustotal key"
export SECURITYTRAILS_API_KEY="add your security trails key"
export GITHUB_TOKEN="add your github token"
export CHAOS_API_KEY="add your chaos key"
export ALIENVAULT_API_KEY="add your alienvault key"
export URLSCAN_API_KEY="add your urlscan key"
export SHODAN_API_KEY="add your shodan key"
export CENSYS_API_ID="add your censys api id"
export CENSYS_API_SECRET="add your censys api secret"
export GOOGLE_API_KEY="add your google api key"
export GOOGLE_CSE_ID="add your google cse id"
export TRELLO_API_KEY="add your trello api key"
export TRELLO_TOKEN="add your trello token"
export INTELX_API_KEY="add your intelx key"

Where to Get API Keys

πŸš€ Usage

Single Domain

./rocket-crawl.sh geturls example.com

Multiple Domains

Create a file with one domain per line:

# domains.txt
example.com
subdomain1.example.com
subdomain2.example.com

Then run:

./rocket-crawl.sh getsuburls domains.txt

πŸ“Š Output

The script generates:

  • urls_<domain>.txt - URLs for individual domains
  • all_urls.txt - Consolidated results for bulk processing

Sample Output

==========================================
[βœ“] CRAWL COMPLETE!
==========================================
Domain: example.com
Total URLs: 45,892
Time taken: 287s
Output file: urls_example.com.txt
==========================================

πŸ” Data Sources

Web Archives

  • Wayback Machine (complete history)
  • Archive.today
  • Common Crawl (5 latest indexes)

API-Based Sources

  • AlienVault OTX (paginated)
  • URLScan.io (full pagination up to 100k results)
  • VirusTotal (URLs, subdomains, communicating files)
  • SecurityTrails (DNS + history)
  • Shodan (DNS + search)
  • Intelligence X
  • Certificate Transparency (crt.sh, Censys)

Active Crawlers

  • Waybackurls
  • GAU (GetAllUrls)
  • Hakrawler
  • Katana
  • GoSpider
  • ParamSpider
  • Waymore

Additional Sources

  • GitHub (code + gists)
  • DNS Dumpster
  • Robots.txt & Sitemaps
  • Trello public boards
  • Paste sites (Pastebin, etc.)

βš™οΈ Configuration

Timeout Settings

All HTTP requests have built-in timeouts:

  • Standard requests: 180 seconds
  • Archive requests: 300 seconds (due to large datasets)
  • Connection timeout: 30 seconds

Filtering

The script automatically filters out:

  • Static files (images, fonts, media)
  • Common social media platforms
  • Duplicate URLs
  • Non-target domain URLs

πŸ›‘οΈ Security Notes

⚠️ IMPORTANT: This script contains embedded API keys that should be removed before sharing publicly!

Before contributing or sharing:

  1. Remove all API keys from the script header
  2. Never commit API keys to version control
  3. Use environment variables for sensitive data
  4. Consider using a .env file for local development

Recommended Setup

# Create a .env file (add to .gitignore)
cat > .env << EOF
export ALIENVAULT_API_KEY="your_key"
export URLSCAN_API_KEY="your_key"
# ... other keys
EOF

# Load before running
source .env
./rocket-crawl.sh geturls example.com

🀝 Contributing

Contributions are welcome! Areas for improvement:

  • Additional data sources
  • Performance optimizations
  • Better error handling
  • Output format options (JSON, CSV)
  • Proxy support

πŸ› Troubleshooting

"Missing required tools" error

Install jq and curl: sudo apt install jq curl

"Rate limited" messages

Wait for the cooldown period or reduce concurrent requests

No results from API sources

Verify your API keys are set correctly and have valid quotas

Slow performance

  • Reduce crawling depth in active crawlers
  • Disable non-essential data sources
  • Use multiple domains in parallel (separate terminal windows)

πŸ“¬ Support

For issues, questions, or feature requests, please open an issue on GitHub.


Happy Hunting! 🎯

About

A powerful bash script that aggregates URLs from 20+ sources including web archives, API endpoints, crawlers, and intelligence platforms. Perfect for reconnaissance, bug bounty hunting, and security assessments.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages