Skip to content

heanczko311299/recon-osint

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Recon-OSINT

Python Platform License Version Languages

Passive OSINT reconnaissance framework designed for the initial recon phase of web penetration tests following eWPTX methodology.

This tool collects open-source intelligence exclusively from publicly available sources — it makes no direct contact with the target system.


Legal Notice

This tool performs passive reconnaissance only.

It queries public data sources (Wayback Machine, certificate transparency logs, DuckDuckGo, GitHub public repositories) using information freely and legally accessible to anyone on the internet.

Always obtain written authorisation before conducting reconnaissance on any target. The authors accept no responsibility for misuse of this tool. Use responsibly and within the bounds of applicable law.


What it does

Recon-OSINT maps the target's public footprint before any active testing begins.

What How Passive?
Subdomain enumeration Sublist3r, crt.sh, Wayback CDX ✅ Yes
Historical URL & file exposure Wayback Machine CDX API ✅ Yes
Indexed sensitive documents DuckDuckGo dorks ✅ Yes
Exposed emails DuckDuckGo "@domain" dork ✅ Yes
Secrets in public repos GitHub search API ✅ Yes

Quick Start

# 1. Install
sudo python3 install.py

# 2. Domain reconnaissance (subdomains)
recon-osint run example.com --module domain

# 3. Sensitive file exposure
recon-osint run example.com --module sensitive

# 4. Full recon + AI reports
recon-osint run example.com --module all

Modules

--module domain

Passive subdomain enumeration from three independent sources.

Source What it finds Output file
Sublist3r Subdomains via search engines & DNS sublist3r.md
crt.sh Subdomains from TLS certificate logs crtsh.md
Wayback Machine CDX Subdomains seen in archived URLs wayback_subdomains.md
AI (OpenAI) Pentest-oriented interpretation domain_report_{domain}.pdf

--module sensitive

Passive discovery of exposed files, credentials and sensitive data.

Source What it finds Output file
Wayback Machine CDX Files by extension + sensitive paths in 10 000 archived URLs wayback_sensitive.md
DuckDuckGo dorks Indexed files by filetype, sensitive keywords, exposed emails dorks.md
GitHub Public repos + secrets/sensitive files in code github.md
AI (OpenAI) Pentest-oriented interpretation sensitive_report_{domain}.pdf

--module all

Runs both modules sequentially. Wayback Machine is queried only once (results reused). Generates an additional consolidated AI executive report.

Output
All individual .md files
domain_report_{domain}.pdf
sensitive_report_{domain}.pdf
full_report_{domain}.pdf — AI pentest roadmap

Results saved to: results/recon/<timestamp>/


AI Analysis

At the end of each module the AI reads all generated .md files and produces a pentest-oriented interpretation:

  • Which subdomains to prioritise in the active phase
  • What the sensitive findings reveal about the infrastructure
  • How exposed data could be leveraged in a web pentest
  • A concrete eWPTX-aligned pentest roadmap

Requires OPENAI_API_KEY in .env. The tool runs fully without it — AI section is simply skipped.


Configuration

Create a .env file in the project root:

# Required for AI analysis
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini
AI_TEMPERATURE=0.3

# Required for GitHub code search
GITHUB_TOKEN=ghp_...

GitHub token

Generate at: GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)

Minimum scope: public_repo (read-only access to public repositories)

Without a token the GitHub module runs repository search only — no authentication required for that.


OSINT Criteria

Edit config/osint_criteria.md to customise what the tool looks for. No code changes needed.

[wayback]
extensions = .sql, .env, .bak, .log, .xml, .json, .pem...
sensitive_paths = /admin, /backup, /.git, /.env, /api...

[dorks]
filetypes = pdf, doc, xlsx, xml, sql, env, pem, key...
sensitive = password, credentials, apikey, token...

[github]
secrets = api_key, password, token, aws_key, ssh_key...
filetypes = .env, .pem, docker-compose.yml, id_rsa...

Technical Notes

Wayback Machine

Uses the CDX API with matchType=prefix — the same query strategy used by waybackpy --known-urls. A matchType=domain wildcard query returns 0 results for most domains; prefix is the correct approach. Up to 10 000 URLs are fetched per scan and filtered for sensitive extensions and paths.

DuckDuckGo Dorks

Uses the ddgs package (the renamed successor to duckduckgo_search). The requirements.txt and install.py reference ddgs directly to avoid the deprecation warning. DuckDuckGo does not support filetype: as precisely as Google — results vary by domain and may be sparse for smaller targets.


Languages

On first run the tool asks which language to use:

  [1] English
  [2] Español
  [3] Deutsch
  [4] Français
  [5] Nederlands

Preference is saved in .lang. Change at any time with recon-osint lang.

Reports and AI analysis are generated in the selected language.


Project Structure

recon-osint/
├── recon.py                    Entry point
├── install.py                  Bootstrap installer
├── uninstall.py                Clean uninstaller
├── pyproject.toml
├── requirements.txt
├── .gitignore
├── lang/
│   ├── en.yml
│   ├── es.yml
│   ├── de.yml
│   ├── fr.yml
│   └── nl.yml
├── config/
│   └── osint_criteria.md       Edit to customise what to look for
├── core/
│   ├── banner.py
│   ├── lang.py
│   └── utils.py
└── modules/
    ├── mod_sublist3r.py
    ├── mod_crtsh.py
    ├── mod_wayback.py          Subdomains + sensitive (single CDX query, reused)
    ├── mod_dorks.py            DuckDuckGo via ddgs
    ├── mod_github.py
    ├── ai_analyzer.py
    ├── cli.py
    ├── config.py
    ├── criteria.py
    ├── reporter.py
    └── scanner.py

For Developers

Adding a new language

  1. Copy lang/en.ymllang/sv.yml
  2. Set language_name and language_code
  3. Translate all values — never the keys
  4. Preserve all {variable} placeholders exactly
  5. The language appears in the menu automatically — no code changes needed

Adding a new source to a module

  1. Add your function in the relevant mod_*.py
  2. Call it from run() and merge results into the dict
  3. Add your source to to_markdown()

Adding a new module

  1. Create modules/mod_newmodule.py with run(domain) and to_markdown(data, lang_t)
  2. Register it in modules/scanner.py
  3. Add the new CLI keys to all lang/*.yml files

Contributing

Pull requests are welcome.

  • New language — translate lang/en.yml, open a PR
  • New source — extend an existing module with test results
  • Bug report — open an issue with steps to reproduce
  • Criteria update — update config/osint_criteria.md

License

MIT — see LICENSE


Built for security professionals. Passive reconnaissance, professional results. Designed around eWPTX web penetration testing methodology.

About

Passive OSINT reconnaissance framework for web penetration testing (eWPTX). Subdomain enumeration, sensitive file exposure, DuckDuckGo dorks, GitHub secrets. AI-powered PDF reports, 5 languages.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages