Passive OSINT reconnaissance framework designed for the initial recon phase of web penetration tests following eWPTX methodology.
This tool collects open-source intelligence exclusively from publicly available sources — it makes no direct contact with the target system.
This tool performs passive reconnaissance only.
It queries public data sources (Wayback Machine, certificate transparency logs, DuckDuckGo, GitHub public repositories) using information freely and legally accessible to anyone on the internet.
Always obtain written authorisation before conducting reconnaissance on any target. The authors accept no responsibility for misuse of this tool. Use responsibly and within the bounds of applicable law.
Recon-OSINT maps the target's public footprint before any active testing begins.
| What | How | Passive? |
|---|---|---|
| Subdomain enumeration | Sublist3r, crt.sh, Wayback CDX | ✅ Yes |
| Historical URL & file exposure | Wayback Machine CDX API | ✅ Yes |
| Indexed sensitive documents | DuckDuckGo dorks | ✅ Yes |
| Exposed emails | DuckDuckGo "@domain" dork |
✅ Yes |
| Secrets in public repos | GitHub search API | ✅ Yes |
# 1. Install
sudo python3 install.py
# 2. Domain reconnaissance (subdomains)
recon-osint run example.com --module domain
# 3. Sensitive file exposure
recon-osint run example.com --module sensitive
# 4. Full recon + AI reports
recon-osint run example.com --module allPassive subdomain enumeration from three independent sources.
| Source | What it finds | Output file |
|---|---|---|
| Sublist3r | Subdomains via search engines & DNS | sublist3r.md |
| crt.sh | Subdomains from TLS certificate logs | crtsh.md |
| Wayback Machine CDX | Subdomains seen in archived URLs | wayback_subdomains.md |
| AI (OpenAI) | Pentest-oriented interpretation | domain_report_{domain}.pdf |
Passive discovery of exposed files, credentials and sensitive data.
| Source | What it finds | Output file |
|---|---|---|
| Wayback Machine CDX | Files by extension + sensitive paths in 10 000 archived URLs | wayback_sensitive.md |
| DuckDuckGo dorks | Indexed files by filetype, sensitive keywords, exposed emails | dorks.md |
| GitHub | Public repos + secrets/sensitive files in code | github.md |
| AI (OpenAI) | Pentest-oriented interpretation | sensitive_report_{domain}.pdf |
Runs both modules sequentially. Wayback Machine is queried only once (results reused). Generates an additional consolidated AI executive report.
| Output |
|---|
All individual .md files |
domain_report_{domain}.pdf |
sensitive_report_{domain}.pdf |
full_report_{domain}.pdf — AI pentest roadmap |
Results saved to: results/recon/<timestamp>/
At the end of each module the AI reads all generated .md files and produces a pentest-oriented interpretation:
- Which subdomains to prioritise in the active phase
- What the sensitive findings reveal about the infrastructure
- How exposed data could be leveraged in a web pentest
- A concrete eWPTX-aligned pentest roadmap
Requires OPENAI_API_KEY in .env. The tool runs fully without it — AI section is simply skipped.
Create a .env file in the project root:
# Required for AI analysis
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini
AI_TEMPERATURE=0.3
# Required for GitHub code search
GITHUB_TOKEN=ghp_...Generate at: GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)
Minimum scope: public_repo (read-only access to public repositories)
Without a token the GitHub module runs repository search only — no authentication required for that.
Edit config/osint_criteria.md to customise what the tool looks for. No code changes needed.
[wayback]
extensions = .sql, .env, .bak, .log, .xml, .json, .pem...
sensitive_paths = /admin, /backup, /.git, /.env, /api...
[dorks]
filetypes = pdf, doc, xlsx, xml, sql, env, pem, key...
sensitive = password, credentials, apikey, token...
[github]
secrets = api_key, password, token, aws_key, ssh_key...
filetypes = .env, .pem, docker-compose.yml, id_rsa...Uses the CDX API with matchType=prefix — the same query strategy used by waybackpy --known-urls. A matchType=domain wildcard query returns 0 results for most domains; prefix is the correct approach. Up to 10 000 URLs are fetched per scan and filtered for sensitive extensions and paths.
Uses the ddgs package (the renamed successor to duckduckgo_search). The requirements.txt and install.py reference ddgs directly to avoid the deprecation warning. DuckDuckGo does not support filetype: as precisely as Google — results vary by domain and may be sparse for smaller targets.
On first run the tool asks which language to use:
[1] English
[2] Español
[3] Deutsch
[4] Français
[5] Nederlands
Preference is saved in .lang. Change at any time with recon-osint lang.
Reports and AI analysis are generated in the selected language.
recon-osint/
├── recon.py Entry point
├── install.py Bootstrap installer
├── uninstall.py Clean uninstaller
├── pyproject.toml
├── requirements.txt
├── .gitignore
├── lang/
│ ├── en.yml
│ ├── es.yml
│ ├── de.yml
│ ├── fr.yml
│ └── nl.yml
├── config/
│ └── osint_criteria.md Edit to customise what to look for
├── core/
│ ├── banner.py
│ ├── lang.py
│ └── utils.py
└── modules/
├── mod_sublist3r.py
├── mod_crtsh.py
├── mod_wayback.py Subdomains + sensitive (single CDX query, reused)
├── mod_dorks.py DuckDuckGo via ddgs
├── mod_github.py
├── ai_analyzer.py
├── cli.py
├── config.py
├── criteria.py
├── reporter.py
└── scanner.py
- Copy
lang/en.yml→lang/sv.yml - Set
language_nameandlanguage_code - Translate all values — never the keys
- Preserve all
{variable}placeholders exactly - The language appears in the menu automatically — no code changes needed
- Add your function in the relevant
mod_*.py - Call it from
run()and merge results into the dict - Add your source to
to_markdown()
- Create
modules/mod_newmodule.pywithrun(domain)andto_markdown(data, lang_t) - Register it in
modules/scanner.py - Add the new CLI keys to all
lang/*.ymlfiles
Pull requests are welcome.
- New language — translate
lang/en.yml, open a PR - New source — extend an existing module with test results
- Bug report — open an issue with steps to reproduce
- Criteria update — update
config/osint_criteria.md
MIT — see LICENSE
Built for security professionals. Passive reconnaissance, professional results. Designed around eWPTX web penetration testing methodology.