Recon-OSINT

Passive OSINT reconnaissance framework designed for the initial recon phase of web penetration tests following eWPTX methodology.

This tool collects open-source intelligence exclusively from publicly available sources — it makes no direct contact with the target system.

Legal Notice

This tool performs passive reconnaissance only.

It queries public data sources (Wayback Machine, certificate transparency logs, DuckDuckGo, GitHub public repositories) using information freely and legally accessible to anyone on the internet.

Always obtain written authorisation before conducting reconnaissance on any target. The authors accept no responsibility for misuse of this tool. Use responsibly and within the bounds of applicable law.

What it does

Recon-OSINT maps the target's public footprint before any active testing begins.

What	How	Passive?
Subdomain enumeration	Sublist3r, crt.sh, Wayback CDX	✅ Yes
Historical URL & file exposure	Wayback Machine CDX API	✅ Yes
Indexed sensitive documents	DuckDuckGo dorks	✅ Yes
Exposed emails	DuckDuckGo `"@domain"` dork	✅ Yes
Secrets in public repos	GitHub search API	✅ Yes

Quick Start

# 1. Install
sudo python3 install.py

# 2. Domain reconnaissance (subdomains)
recon-osint run example.com --module domain

# 3. Sensitive file exposure
recon-osint run example.com --module sensitive

# 4. Full recon + AI reports
recon-osint run example.com --module all

Modules

`--module domain`

Passive subdomain enumeration from three independent sources.

Source	What it finds	Output file
Sublist3r	Subdomains via search engines & DNS	`sublist3r.md`
crt.sh	Subdomains from TLS certificate logs	`crtsh.md`
Wayback Machine CDX	Subdomains seen in archived URLs	`wayback_subdomains.md`
AI (OpenAI)	Pentest-oriented interpretation	`domain_report_{domain}.pdf`

`--module sensitive`

Passive discovery of exposed files, credentials and sensitive data.

Source	What it finds	Output file
Wayback Machine CDX	Files by extension + sensitive paths in 10 000 archived URLs	`wayback_sensitive.md`
DuckDuckGo dorks	Indexed files by filetype, sensitive keywords, exposed emails	`dorks.md`
GitHub	Public repos + secrets/sensitive files in code	`github.md`
AI (OpenAI)	Pentest-oriented interpretation	`sensitive_report_{domain}.pdf`

`--module all`

Runs both modules sequentially. Wayback Machine is queried only once (results reused). Generates an additional consolidated AI executive report.

Output
All individual `.md` files
`domain_report_{domain}.pdf`
`sensitive_report_{domain}.pdf`
`full_report_{domain}.pdf` — AI pentest roadmap

Results saved to: results/recon/<timestamp>/

AI Analysis

At the end of each module the AI reads all generated .md files and produces a pentest-oriented interpretation:

Which subdomains to prioritise in the active phase
What the sensitive findings reveal about the infrastructure
How exposed data could be leveraged in a web pentest
A concrete eWPTX-aligned pentest roadmap

Requires OPENAI_API_KEY in .env. The tool runs fully without it — AI section is simply skipped.

Configuration

Create a .env file in the project root:

# Required for AI analysis
OPENAI_API_KEY=sk-...
OPENAI_MODEL=gpt-4o-mini
AI_TEMPERATURE=0.3

# Required for GitHub code search
GITHUB_TOKEN=ghp_...

GitHub token

Generate at: GitHub → Settings → Developer settings → Personal access tokens → Tokens (classic)

Minimum scope: public_repo (read-only access to public repositories)

Without a token the GitHub module runs repository search only — no authentication required for that.

OSINT Criteria

Edit config/osint_criteria.md to customise what the tool looks for. No code changes needed.

[wayback]
extensions = .sql, .env, .bak, .log, .xml, .json, .pem...
sensitive_paths = /admin, /backup, /.git, /.env, /api...

[dorks]
filetypes = pdf, doc, xlsx, xml, sql, env, pem, key...
sensitive = password, credentials, apikey, token...

[github]
secrets = api_key, password, token, aws_key, ssh_key...
filetypes = .env, .pem, docker-compose.yml, id_rsa...

Technical Notes

Wayback Machine

Uses the CDX API with matchType=prefix — the same query strategy used by waybackpy --known-urls. A matchType=domain wildcard query returns 0 results for most domains; prefix is the correct approach. Up to 10 000 URLs are fetched per scan and filtered for sensitive extensions and paths.

DuckDuckGo Dorks

Uses the ddgs package (the renamed successor to duckduckgo_search). The requirements.txt and install.py reference ddgs directly to avoid the deprecation warning. DuckDuckGo does not support filetype: as precisely as Google — results vary by domain and may be sparse for smaller targets.

Languages

On first run the tool asks which language to use:

  [1] English
  [2] Español
  [3] Deutsch
  [4] Français
  [5] Nederlands

Preference is saved in .lang. Change at any time with recon-osint lang.

Reports and AI analysis are generated in the selected language.

Project Structure

recon-osint/
├── recon.py                    Entry point
├── install.py                  Bootstrap installer
├── uninstall.py                Clean uninstaller
├── pyproject.toml
├── requirements.txt
├── .gitignore
├── lang/
│   ├── en.yml
│   ├── es.yml
│   ├── de.yml
│   ├── fr.yml
│   └── nl.yml
├── config/
│   └── osint_criteria.md       Edit to customise what to look for
├── core/
│   ├── banner.py
│   ├── lang.py
│   └── utils.py
└── modules/
    ├── mod_sublist3r.py
    ├── mod_crtsh.py
    ├── mod_wayback.py          Subdomains + sensitive (single CDX query, reused)
    ├── mod_dorks.py            DuckDuckGo via ddgs
    ├── mod_github.py
    ├── ai_analyzer.py
    ├── cli.py
    ├── config.py
    ├── criteria.py
    ├── reporter.py
    └── scanner.py

For Developers

Adding a new language

Copy lang/en.yml → lang/sv.yml
Set language_name and language_code
Translate all values — never the keys
Preserve all {variable} placeholders exactly
The language appears in the menu automatically — no code changes needed

Adding a new source to a module

Add your function in the relevant mod_*.py
Call it from run() and merge results into the dict
Add your source to to_markdown()

Adding a new module

Create modules/mod_newmodule.py with run(domain) and to_markdown(data, lang_t)
Register it in modules/scanner.py
Add the new CLI keys to all lang/*.yml files

Contributing

Pull requests are welcome.

New language — translate lang/en.yml, open a PR
New source — extend an existing module with test results
Bug report — open an issue with steps to reproduce
Criteria update — update config/osint_criteria.md

License

MIT — see LICENSE

Built for security professionals. Passive reconnaissance, professional results. Designed around eWPTX web penetration testing methodology.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Recon-OSINT

Legal Notice

What it does

Quick Start

Modules

`--module domain`

`--module sensitive`

`--module all`

AI Analysis

Configuration

GitHub token

OSINT Criteria

Technical Notes

Wayback Machine

DuckDuckGo Dorks

Languages

Project Structure

For Developers

Adding a new language

Adding a new source to a module

Adding a new module

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
core		core
lang		lang
modules		modules
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
install.py		install.py
pyproject.toml		pyproject.toml
recon.py		recon.py
requirements.txt		requirements.txt
uninstall.py		uninstall.py

Folders and files

Latest commit

History

Repository files navigation

Recon-OSINT

Legal Notice

What it does

Quick Start

Modules

--module domain

--module sensitive

--module all

AI Analysis

Configuration

GitHub token

OSINT Criteria

Technical Notes

Wayback Machine

DuckDuckGo Dorks

Languages

Project Structure

For Developers

Adding a new language

Adding a new source to a module

Adding a new module

Contributing

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`--module domain`

`--module sensitive`

`--module all`

Packages