Py-Wappalyzer inspects HAR data to fingerprint technologies (similar to Wappalyzer/Webappanalyzer). It can:
- Parse HAR files for HTML, scripts, headers, cookies, and meta
- Capture a HAR from a live URL using Patchright (optional screenshot)
- Return confidence, versions, categories, and groups
- Run as a CLI or be imported as a library
- Store results in a local SQLite database under
data/ - Keep captured HARs (and optional screenshots) under
data/captures/data/screenshots
pip install py-wappalyzer # core library + CLI (HAR analysis only)
pip install py-wappalyzer[cli] # + patchright for live capture
pip install py-wappalyzer[web] # + FastAPI/uvicorn/jinja2 for web/API
pip install py-wappalyzer[full] # everything (capture + web)Patchright needs browsers installed. Run
patchright install chromium(or docs) before using--url. To keep downloads inside this project, setWAPPALYZER_BROWSERS=./browsers(default) so Patchright uses that path. Fingerprint data lives underdata/wappalyzer-databy default; override withWAPPALYZER_DATA_DIR. Captured HARs go underdata/captures/YYYY/MM/DD/..., screenshots underdata/screenshots/...; override withWAPPALYZER_CAPTURE_DIR/WAPPALYZER_SCREENSHOT_DIR.
Detect from an existing HAR (default flow):
python -m py_wappalyzer --har path/to/site.har --format prettyCapture a URL with Patchright, optionally save a screenshot:
python -m py_wappalyzer --url https://example.com \
--screenshot out/example.png \
--output out/result.jsonOptions:
--har PATHAnalyze an existing HAR file.--url URLCapture and analyze a live page (requires Patchright + browsers).--screenshot PATHSave a screenshot during capture (URL mode only).--format json|prettyOutput format (default json).--output PATHWrite results to a file instead of stdout.--verboseEnable debug logs.--refresh-dataForce re-download of fingerprint data.
Exit code is non-zero on errors (missing HAR, capture failure, etc.).
Run a FastAPI web server with a minimal UI (Jinja) and JSON API:
python -m py_wappalyzer.web # serves on http://localhost:8000- Visit
/for a simple form to capture a URL (uses Patchright, requires browsers). POST /api/analyzewith JSON{ "url": "https://example.com" }or{ "har_path": "file.har" }.GET /api/history?limit=10returns recent detections (stored in SQLite atdata/py_wappalyzer.dbby default; override withWAPPALYZER_DB).- API responses include stored
har_pathand optionalscreenshot_pathwhen captures are performed. GET /healthzreturns service status.- Optional auth (disabled by default):
- API bearer: set
WAPPALYZER_API_BEARER=token - Web basic auth: set
WAPPALYZER_WEB_USER=userandWAPPALYZER_WEB_PASS=pass
- API bearer: set
Build and run the API/UI with Patchright + Chromium baked in:
docker build -t py-wappalyzer .
docker run -p 8000:8000 \
-e WAPPALYZER_WEB_USER=user -e WAPPALYZER_WEB_PASS=pass \
-v pywappalyzer-data:/app/data \ # persist fingerprints, DB, captures, screenshots
py-wappalyzerThe container keeps browsers under /app/browsers, fingerprint data under /app/data/wappalyzer-data, and the SQLite DB at /app/data/py_wappalyzer.db.
Auth: set WAPPALYZER_WEB_USER/WAPPALYZER_WEB_PASS (applies to both UI and API). API accepts either Basic user:pass or Bearer user:pass headers. If you prefer a separate token, set WAPPALYZER_API_BEARER.
- Local CLI:
python -m py_wappalyzer --url https://example.com --format pretty - Local web/API:
python -m py_wappalyzer.webthen openhttp://localhost:8000 - Docker:
docker run -p 3001:8000 py-wappalyzerthen openhttp://localhost:3001 - Auth (optional): set
WAPPALYZER_API_BEARERand/orWAPPALYZER_WEB_USER/WAPPALYZER_WEB_PASS
- Auth is off by default. For public exposure, set either:
- Basic auth:
WAPPALYZER_WEB_USER+WAPPALYZER_WEB_PASS(applies to UI and API). - Bearer:
WAPPALYZER_API_BEARERor simply useBearer user:passwhen basic is set.
- Basic auth:
- File serving via
/filesis restricted to the project’sdata/subpaths (captures/screenshots/fingerprints). - Requests are unauthenticated unless you set the env vars above; if deploying in production, enforce TLS at the edge (ingress/reverse proxy) and restrict ingress as needed.
- Data paths default to the project or container: fingerprints (
data/wappalyzer-data), captures (data/captures/...), screenshots (data/screenshots/...), DB (data/py_wappalyzer.db). Override viaWAPPALYZER_DATA_DIR,WAPPALYZER_CAPTURE_DIR,WAPPALYZER_SCREENSHOT_DIR,WAPPALYZER_DB. - Patchright browsers default to
./browsers(or/app/browsersin Docker); setWAPPALYZER_BROWSERS/PLAYWRIGHT_BROWSERS_PATHto move them. - Health:
GET /healthzfor basic status; add monitoring/rate limits at your ingress if exposing publicly.
Detect from a HAR file path:
from py_wappalyzer import detect_technologies
results = detect_technologies(har_path="path/to/site.har")
for tech in results:
print(tech["name"], tech["confidence"], tech["versions"])Detect from structured JSON (HAR-like):
from py_wappalyzer import detect_technologies
payload = {
"url": "https://example.com",
"html": "<html>...</html>",
"headers": {"server": "nginx"},
"cookies": {},
"scripts": ["https://cdn.example/app.js"],
"meta": {"generator": "WordPress"},
}
results = detect_technologies(json_data=payload)Capture a HAR first, then analyze:
from py_wappalyzer.capture import capture_har_with_patchright
from py_wappalyzer import detect_technologies
har_path = capture_har_with_patchright("https://example.com", "out/site.har")
results = detect_technologies(har_path=str(har_path))- HAR parsing: extracts URL, HTML, scripts, headers, cookies, and meta.
- Technology data: loads JSON fingerprints locally (
bin/wappalyzer-data) or fetches remotely if missing. - Matching: regex-based patterns over URL, HTML, scripts, headers, cookies, meta, DNS, and certificate issuer.
- Results: sorted by confidence with versions, categories, and groups.
- Lean dependencies: stdlib networking for fingerprint data; Patchright is optional and only required for live capture.
- Logging:
--verbosefor CLI, or configureloggingin your app. - Python 3.8+.



