A modular dark web crawler and search engine that collects, indexes, and searches hidden service pages using Boolean, TF-IDF, and BM25 models. This project is designed for research and demonstration purposes.
This project is intended for educational and research purposes only. Accessing the dark web may involve legal, ethical, and security risks. Use responsibly, stay anonymous, and always comply with your local laws.
- Asynchronous Crawling
- Automated Seed Discovery
- Multi-Model Indexing
- CLI Application
Python 3.11+ uv Tor for crawling (running locally on port 9050)
- Clone the repository:
git clone https://github.com/vshulcz/darkweb_search_engine.git
cd darkweb_search_engine
- Install dependencies:
uv sync
The project provides a CLI application with three main commands:
- Crawl - get seed URLs for a query and start crawling the dark web:
python main.py crawl --query "onion forum" --max-depth 2 --concurrency 5
- Index - re-index pages from the database using all three models. Indexes are saved in pickle files:
python main.py index
- Search - search the indexed database with a query using the chosen model (tfidf, bm25, or boolean). Only the top 5 results are displayed.
python main.py search --query "buy onion domain" --model tfidf