This document explains everything: installation, configuration, components, how to create baselines, how scanning, parsing and notifications work, backup & cleanup, tests, debugging, and recommended best practices.
- Overview
- Installation & prerequisites
- File layout (project structure)
- Configuration files
config.ini(fields explained)ips.txt
- Database (schema & tables)
- How the system works (workflow)
- Scripts and modules — what each does
main.pyscanner/tcp_scanner.pyscanner/udp_scanner.pyparser/tenant_parser.pymailer.pyrun.shcreate_baseline.pyclean_reset.pylist_tenants.py
- Creating and updating a baseline (detailed steps)
- Email notifications (how they're built & sent)
- Logging and troubleshooting
- Testing (including UDP test) and verification steps
- Automation & scheduling
- Backup & cleanup (what
clean_reset.pydoes) - Security considerations
- Development notes & contribution guide
- FAQ
- Appendix: example files
SurfaceMinder is a small External Attack Surface Management (EASM) toolkit whose goal is to periodically scan a list of public IPs for exposed services (both TCP and UDP), keep a baseline per tenant, detect changes vs baseline and notify via email when changes occur.
Design goals:
- Separate TCP and UDP scanning code paths for flexibility.
- Parser that ingests Nmap XML and stores results in a small SQLite DB.
- Tenant-aware baseline system: per-tenant baseline stored in DB; comparisons detect added/removed/changed ports.
- Email alerts summarizing differences (single combined mail for TCP+UDP).
- Tools to create baseline, clean/reset workspace, list tenants and run the full pipeline interactively.
Minimum prerequisites:
- Python 3.8+ (3.10/3.11 recommended)
nmap(for scans)sqlite3(CLI optional but helpful)pipto install Python requirements (if any)
Recommended Python packages (install via pip if present in requirements.txt):
- requests (if
maileruses API) - dataclass-wizard (if used by other modules)
Install nmap on Debian/Ubuntu/Kali:
sudo apt update
sudo apt install -y nmap- scans/ # where XMLs from nmap are written
- scanner/
- tcp_scanner.py
- udp_scanner.py
- parser/
- tenant_parser.py
- mailer.py
- main.py # CLI orchestrator
- run.sh # interactive runner
- create_baseline.py # python helper for baseline
- clean_reset.py # python backup & reset tool
- list_tenants.py # list tenants helper
- config.ini # configuration (SMTP, nmap, paths)
- db/easm.sqlite # created automatically by parser
- logs/ # runtime logs
- backup/ # used by clean_reset.py
- ips.txt # targets file (one IP per line)
A sample minimal config.ini:
[general]
db_path = db/easm.sqlite
scans_dir = scans
logs_dir = logs
[nmap]
nmap_cmd = nmap
# Default TCP options used by tcp_scanner (example):
tcp_opts = -sT -p- -Pn -sV -oX
# Comma-separated UDP ports used by run.sh and scan-udp if not overridden
udp_ports = 53,123,161
[smtp]
host = smtp.gmail.com
port = 587
starttls = True
user = youremail@gmail.com
password = your_app_password_or_smtp_token
from = youremail@gmail.com
[app]
ips_file = ips.txtImportant notes:
- For Gmail, use an App Password (if account has 2FA) or configure OAuth if preferred. Plain account passwords may be blocked.
udp_portsis used byrun.shto decide whether to run UDP scans automatically. If empty, UDP is skipped.
Simple text file, one IP per line. Blank lines and lines starting with # are ignored.
Example:
# internal test
127.0.0.1
8.8.8.8
The SQLite DB (by default db/easm.sqlite) contains a few key tables used by the parser and the benchmark baseline logic. Typical tables:
-
scan_files— metadata about scansidINT PKscan_fileTEXT (filename)scan_typeTEXT ('tcp' or 'udp')created_atTEXT (ISO timestamp)
-
ports— port-level records extracted from XMLsidINT PKscan_fileTEXTipTEXTportINTEGERprotoTEXT ('tcp' or 'udp')stateTEXT ('open','closed','filtered',...)serviceTEXT (service name / version)
-
baseline_ports— per-tenant baselineidINT PKtenantTEXTipTEXTportINTEGERprotoTEXTstateTEXTserviceTEXTset_atTEXT (ISO timestamp)
(Your parser/tenant_parser.py also uses these tables; the clean_reset.py will export and backup them before delete all.)
High-level flow used by run.sh or orchestration:
- Scan:
scan-tcpruns Nmap to generate XMLs for targets and writes them toscans/. Optionallyscan-udpruns for UDP ports.scanner/*scripts callnmapand save XML output. - Ingest (parse-scans):
parser/tenant_parser.py --ingestreads XMLs fromscans/and insertsscan_files+portsrows in DB. - Baseline: A baseline for a tenant is set via the
set-baselineaction (either using the latest scans or a specific scan file). Baseline rows live inbaseline_ports. - Check:
check-baselinecomparesbaseline_portswith latest scan (TCP) and computesadded/removed/changed.check-baseline-combineddoes the same for combined latest TCP+UDP. - Notification: If differences exist,
mailer.send_mailis called with a short subject and a body describing the changes (it now prints the body to stdout for easier debugging).
Below each file/module with responsibilities, inputs, outputs and critical implementation notes.
Role: central CLI entrypoint for the pipeline: scan-tcp, scan-udp, parse-scans, set-baseline, check-baseline, check-baseline-combined.
Key functions:
scan_tcp(ips_file, tenant)— loops IPs, callsscanner/tcp_scanner.pyfor each IP.scan_udp(ips_file, tenant, udp_ports)— idem for UDP.parse_scans()— callsparser/tenant_parser.py --ingestto populate DB.set_baseline(tenant[, scan_file])— sets baseline using parser helper.check_baseline(tenant)— compares baseline vs latest tcp only.check_baseline_combined(tenant)— compares baseline vs latest tcp+udp combined and builds body.
Role: perform TCP scan of a single IP and write Nmap XML to scans/.
Behavior:
- Accepts
--ipand--tenant(and optional--tcp-optsoverride). - Runs
nmapwith configured options (e.g.-sT -p- -Pn -sV -oX <file>), writes output toscans/scan-<timestamp>-tcp-<ip>.xmland prints or returns the filename.
Important:
- For speed, consider
-T4and limiting ports (default in example is-p-). - Ensure
nmapis in PATH or adjustnmap_cmdinconfig.ini.
Role: perform UDP scan for the specified ports and write XML.
Behavior:
- Accepts
--ip,--tenant, and--udp-ports(comma-separated string). - Runs
nmap -sU -p <ports> -Pn -sV -oX <file>and writesscans/scan-<timestamp>-udp-<ip>.xml.
Notes:
- UDP scans are slower and less deterministic; services often don't reply.
nmapmay showopen|filteredif no ICMP error is returned.- Running
nmap -sUusually requires root orcap_net_rawcapability.
Role: parse Nmap XML files, populate DB tables and provide tenant-level utilities.
Behavior:
--ingest: scans thescans/directory and parses new XMLs, storing data inscan_filesandports.--set-baseline <tenant>: sets baseline from latest scans or from a given scan file.- comparison functions:
compare_baseline_to_latest(tenant)returns a dict withlatest_scan_fileandreport(per IP: added/removed/changed lists)compare_baseline_to_latest_combined(tenant)does combined TCP+UDP.
Important:
- Parser must handle both TCP and UDP XMLs (nmap XML includes
<port protocol="udp">entries — parser must record protocol per port). - The
reportstructure used bymain.pyexpects keys per IP with lists foradded,removed,changedin a specific tuple format (see_format_reportinmain.py).
Role: send email notifications.
Behavior:
- Exposes
send_mail(subject, body). - Reads SMTP config from
config.ini(smtp.host,smtp.port,smtp.starttls,smtp.user,smtp.password,smtp.from). - Uses STARTTLS if configured; if server doesn't support it and the config enables starttls,
smtplibraises — ensurestarttls = Falsefor servers without STARTTLS (like local MailHog setups), or simply point to a real SMTP server.
Security:
- Avoid storing plaintext passwords in repo; consider environment variables or a secrets manager. For small tests, app password is acceptable.
Role: interactive runner that:
- asks for tenant (or takes via CLI),
- validates tenant exists in DB,
- loops all IPs in
ips.txt, runsscan-tcpfor each IP, - automatically runs
scan-udpifnmap.udp_portspresent or--udp-portsoverride provided, - runs
parse-scans, - runs
check-baseline-combinedand sends mail if changes.
Lock: creates a directory .easm_runner.lockdir with PID to avoid concurrent runs. It detects and clears stale locks.
Logs: writes into logs/run_easm-<timestamp>.log.
Role: script to automate baseline creation: runs tcp+udp scans (optional), runs ingest, then constructs a combined baseline by taking latest tcp and latest udp scans and inserting their ports into baseline_ports for the tenant.
Options:
--tenantrequired.--no-scan-tcpor--no-scan-udpto skip running scans.--skip-ingestif you already ingested XMLs.
Behavior:
- Collects port rows from
portsrelated to the latest tcp/udp scan files, inserts them intobaseline_ports(deletes existing tenant baseline first), recordsset_attimestamp.
Role: backup and reset workspace. It is the safe "nuke & pave" script.
What it does:
- Creates
backup/<TIMESTAMP>/and copies:- DB binary file (e.g.
db/easm.sqlite) - SQL dump (iterdump)
- CSV export of
baseline_ports - all
scans/*.xml - all
logs/* config.ini- a
cleanup-<timestamp>.logdescribing operations
- DB binary file (e.g.
- Moves/cleans originals: moves XMLs & logs into backup, moves DB into backup (
old-db-<timestamp>.sqlite) and creates a new empty DB. It attempts to callparser.tenant_parser.init_db(conn)to initialize schema if available.
Safety:
- Interactive confirmation required unless
--yespassed. - Uses safe move/copy & fallback copy when cross-device moves fail.
Role: utility to list tenants found in DB. Searches tenants table (if exists) and baseline_ports for tenant names; provides counts and last baseline set timestamp. Supports --json for machine-readable output.
- Make sure you have
ips.txtwith targets. - Run a full interactive runner to generate scans and ingest:
./run.sh --tenant testtenant- The runner will ask to set baseline (or you can use
create_baseline.py):
python3 create_baseline.py --tenant testtenantThis will run scans (unless --no-scan-* used), ingest, and create a combined baseline from the latest tcp+udp scans.
- Run scans:
python3 main.py scan-tcp --ips-file ips.txt --tenant testtenant
python3 main.py scan-udp --ips-file ips.txt --tenant testtenant --udp-ports "53,123"- Ingest scans:
python3 main.py parse-scans- Create baseline (make a tcp and udp scan and it creates a baselines based on this results):
python3 main.py create-baseline --tenant testtenantmain.pybuilds a subject likeEASM: <N> cambiamenti (tcp+udp) tenant=<tenant>and a multi-line body with sections per IP and per change type.- For debug,
main.pyprints the subject/body to stdout before callingmailer.send_mail. mailer.pyreadssmtpsection fromconfig.ini. For Gmail usesmtp.gmail.com:587and STARTTLS; store app password insmtp.passwordor prefer environment secrets.
- Runner:
logs/run_easm-<timestamp>.log(contains full stdout/stderr of actions called by runner). - Parser: prints errors/exceptions when parsing invalid XMLs; check parse-scans output.
- If an action fails silently in the runner, run the underlying action directly for full trace, e.g.:
python3 main.py scan-udp --ips-file ips.txt --tenant testtenant --udp-ports "53,123"python3 main.py parse-scanspython3 main.py check-baseline-combined --tenant testtenant
Common issues & fixes:
- UDP scans produce open|filtered: expected for many UDP services — ensure the service responds or the listener replies (use UDP echo during tests).
- nmap not in PATH: set
nmap_cmdinconfig.inior add to PATH. - Lock stale: remove
.easm_runner.lockdironly if no run is active (checkps aux).
sqlite3 db/easm.sqlite "SELECT proto, COUNT(*) FROM ports GROUP BY proto;"
sqlite3 db/easm.sqlite "SELECT proto, COUNT(*) FROM baseline_ports WHERE tenant='TEST' GROUP BY proto;"To run the pipeline periodically, you can use cron on Linux. Example crontab to run hourly (non-interactive):
-
Create a non-interactive wrapper that accepts tenant and ip file and runs the sequence without prompts (or pass flags to
run.shif you added--yesetc.). -
Example cron entry (run as the user owning the repo):
0 * * * * cd /home/user/SurfaceMinder && /usr/bin/bash ./run.sh --tenant mytenant --ips-file ips.txt >> logs/cron-run.log 2>&1
Prefer using systemd timers for more robust scheduling if you need reliability.
clean_reset.py will:
- backup DB (binary), dump SQL via
iterdump, exportbaseline_portsto CSV, - move
scans/*.xmlandlogs/*intobackup/<TIMESTAMP>/, - move DB into backup and create a fresh DB; attempt to initialize schema via
parser.tenant_parser.init_db(conn)if provided.
Use:
python3 clean_reset.py --backup-root backup --yes- Store SMTP credentials safely: avoid committing
config.iniwith passwords. Consider env vars or OS-level secrets. - Limit access to db/: restrict file permissions to the user running scans.
- Nmap capability:
setcap cap_net_raw+ep $(which nmap)allows nmap to run UDP scans as non-root but gives it network raw capabilities — treat carefully. - Avoid running untrusted XML: parser reads XML files; ensure scans come from controlled nmap runs only.
- Keep code modular: scanner vs parser vs mailer.
- Unit tests: consider unit tests for parser functions and report diffing logic. Add tests under
tests/. - Linting and static type hints help maintainability.
Q: Why do UDP scans sometimes show open|filtered?
A: Because UDP is connectionless. Nmap can mark a port open|filtered when it cannot determine status — a service that replies will let nmap mark open.
Q: Mail fails with STARTTLS errors
A: If using a local SMTP dev server (MailHog) set smtp.starttls = False in config.ini. For Gmail use starttls=true and an app password.
Q: My run exits because of a stale lock
A: Remove .easm_runner.lockdir only if no other runner is active. Better: keep the PID-enabled lock logic in run.sh.
This document is intentionally verbose to be the single reference point for everything in SurfaceMinder.