Releases: ali-master/audit
Release 1.6.0
audit v1.6.0
✨ Highlights
Promptfoo evaluation coverage for the audit pipeline's stage prompts, plus a deterministic safety net in the Feedback stage and supporting skills/standards.
Added
- Promptfoo eval suites for the pure-reasoning stages. New
evals/harness that loads each stage's shippedprompts/*.mdverbatim and grades real model
output against the stage schemas — covering Dedupe (05), Report (08), Gapfill (04), and Feedback (07). Each suite pairs schema validation
with behavioral assertions (root-cause clustering, reachable-only reporting, coverage-driven task generation, and sibling-targeting) for deterministic,
regression-proof grading. - Cross-file schema validation helper (
evals/lib/validate-schema.cjs) that resolves relative$refs (e.g.gapfill_output/feedback_output→
hunt_task.schema.json) viaajv, which promptfoo's built-inis-jsoncannot do alone. - No-retest floor in the Feedback stage.
partitionRetests()deterministically drops any generated hunt task that re-targets an already-proven
finding.file, backstopping a semantic rule no JSON schema can express. Drops are logged per-task and counted in the stage summary — never silent — and are
covered by 6 new unit tests. - New skills/standards:
promptfoo-evalsskill (with cheatsheet) andredteam-plugin-developmentstandards for authoring redteam plugins and graders.
Changed
- Tightened stage prompt contracts.
04-gapfill.md,05-dedupe.md,07-feedback.md, and08-report.mdnow pin exact output key names and re-list
every required field, eliminating schema-compliance drift (e.g. wrong root keys, droppedrationale, renamedcoverage_analysisarrays) observed across
models. 07-feedback.mdno longer permits bundling a proven sink file into a broader "sweep" task — it must target only new sibling locations.- Fixed a
report.schema.json/trace.schema.jsoninconsistency that would have rejected admin-gated findings (auth_requiredon entry points).
Dependencies
- Bumped
@anthropic-ai/claude-agent-sdkto v0.3.181.
Release 1.5.0
audit v1.5.0
Code-grounded fix guidance: every finding can now carry a remediation written from the actual code, surfaced wherever you review findings.
✨ Highlights
audit advise — remediation grounded in your code
A new read-only stage that reads the real sink and reachability call chain and writes a fix specific to this codebase — the actual function, the framework's safe API, the exact change —
plus an optional fix sketch and references. It explains the fix; the fix stage writes the patch.
audit advise --run-id my-run # reachable findings (default)
audit advise --run-id my-run --scope all # every finding
audit advise --run-id my-run --finding f_ab12 # just one--scope reachable|confirmed|all, --finding <id>, and --force (regenerate) are supported. Results are stored per finding and reused everywhere.
One recommendation, surfaced everywhere
report(json/md/sarif) now fills each finding's recommendation from the generated advice, so the report, the viewer, andaudit advisenever disagree.- Triage viewer — each finding's detail panel has a Solution box showing the code-grounded recommendation (with root cause, an optional fixed-code sketch, references, the
validator's suggested test, and a CWE link). When advice hasn't been generated yet, a Generate fix button produces it on demand (report --serve --allow-api-keywhen you rely on
ANTHROPIC_API_KEY).
🔄 Changed
- The viewer's previous hard-coded, vuln-class remediation catalog has been removed in favor of the code-grounded advice above. Generic boilerplate is replaced by guidance that names
the real code.
📝 Notes
adviseis read-only and opt-in (a command or the viewer button) — it never edits files; useaudit fixto produce patches.- The triage run stores its reproduced finding, so
audit advise --run-id <triage-run> --finding <id>produces the same code-grounded fix for a triaged report.
Release 1.4.0
audit v1.4.0
Background runs you can fire-and-track, and a redesigned triage viewer that opens on a live analytics dashboard.
✨ Highlights
Background runs & session tracking
audit run -d/--background— detach a scan into a background process and get your shell back immediately. Output streams toresults/<run-id>/run.log; auth and path checks still
run in the foreground so misconfiguration fails fast before detaching.audit sessions(aliasps) — list every run currently in progress. Background runs show their driving PID and whether it's still alive (arunningrow whose process has
died is flagged in red as crashed); foreground runs are shown too, and finished runs are pruned automatically.--jsonfor scripting.
audit run -d --max-cost-usd 30 # start in the background, returns at once
audit sessions # what's active, and is it still alive?
tail -f results/<run-id>/run.log # follow alongRedesigned triage viewer
audit report --serve is now a modern triage dashboard, still a zero-build local UI served straight from SQLite:
- Live analytics — finding / reachable / confirmed / triaged counts, average confidence, and severity / vulnerability-class / hottest-file distribution bars that recompute as you
filter. The severity and class bars are click-to-filter.
filter. The severity and class bars are click-to-filter. - Filter, search & sort — full-text search plus severity, status, class, reachable-only, and canonical-only filters; sort any column; navigate the list with
j/kor arrow keys. - Richer detail panel — severity, reachability, validation status, confidence and canonical flags, with the reachability call chain drawn as a connected timeline.
- Triage verdicts (confirm / false-positive / won't-fix) still persist to the
triagetable, and "Export baseline" downloads a suppression baseline for the next run.
Viewer networking
report --serve --host <host>— bind the viewer beyond loopback (e.g.--host 0.0.0.0) when you need it reachable from another machine. It binds to127.0.0.1by default and warns
when exposed, since the viewer has no authentication.
📝 Notes
- Background runs and the sessions registry require no new flags to read —
audit sessionsworks for foreground runs as well. - The viewer continues to render all finding text via DOM
textContent(neverinnerHTML), so evidence from the audited repo can't execute inside the tool.
Release 1.3.0
audit v1.3.0
A release focused on fitting audit into real workflows — CI pipelines, pull requests, and the human triage loop — plus closing the loop after a scan with auto-fixes, a triage UI, and cost
visibility, and inbound-report triage.
✨ Highlights
CI / pull-request workflow
- Diff-aware scanning —
audit run --base <ref>(PR mode, merge-base) or--since <ref>(incremental) constrains Recon to the changed files plus their blast radius
(callers/callees/importers). A PR scan costs cents instead of dollars. - Baseline & delta —
audit baselinesnapshots a run's findings into a committable file;--baseline <path>then suppresses known issues and reports NEW / FIXED / STILL-PRESENT.
Findings are matched by a line-shift-robust fingerprint (vuln_class+ normalized path + sink-code hash), so reformatting and code movement don't resurface old noise. - SARIF + exit-code gating —
audit report --format sarifemits SARIF 2.1.0 for the GitHub Advanced Security tab, GitLab SAST, and other triage tooling. Becauseauditproduces
reachability traces, each result carries acodeFlowsentry (entry-point → sink), richer than typical single-location SAST output, pluspartialFingerprintsfor cross-run dedupe.
--fail-on <severity>exits4when a finding crosses the threshold — and with a baseline, only new findings can trip the gate.
Close the loop after the report
- Auto-fix (Stage 9, opt-in) —
audit fixgenerates a minimal patch and a regression test for each confirmed and reachable finding. Each fix is produced inside a throwaway git
worktree, so your working tree is never touched; patches are saved for review.--applylands them on a branch (clean tree required) and--open-propens a draft PR viagh.
Human-in-the-loop by design — it never auto-merges. - Interactive triage viewer —
audit report --serveopens a local web UI (127.0.0.1) reading straight from SQLite: filter by severity / reachability / status, inspect the call chain,
and mark findings confirm / false-positive / won't-fix. Export your suppressions straight to a baseline. - Cost observability —
audit statsbreaks spend down by stage and model, with token usage, prompt-cache hit ratio, and cost-per-confirmed / cost-per-reachable finding — so you can
tunestages.yamlwith data instead of guesswork.--jsonfor automation.
Bug-bounty / VDP triage
audit triage --report <path>— ingest an inbound researcher submission, reproduce it against the real code, run it through the same adversarial Validate stage (skeptical,
different model) and reachability Trace gate, and dedupe it against your known issues (--against <run-id>/--baseline). Outcomes one verdict — accept / reject / duplicate /
needs_info / not_reproduced — with the trace attached. The reachability gate is decisive: a confirmed-but-unreachable claim is recommendedreject(real defect, but out of scope), giving
triagers the "can't reach this, likely invalid" signal that cuts through low-quality reports. Supports stdin (--report -), JSON output, and live-target reproduction.
🛠Improvements
- Added
audit --version(-V). - All CLI output now flows through a single logger, including a new chrome-free
log.printfor tables and machine output, keeping piped JSON/SARIF artifacts clean. - Fixed the live progress line flooding the terminal when activity text exceeded the terminal width (it now clamps to one row).
📝 Notes
- Auto-fix requires a git repository;
--applyrequires a clean working tree, and--open-propens a draft PR and never merges. - New CLI exit code:
4=--fail-ongate tripped (distinct from2usage /3budget abort). - Diff mode and triage dedupe assume a git repo.
Release 1.2.0
Release 1.1.0
- docs(cli): update CLI references and examples for
auditcommand (385a396) - feat(cli): improve updateNotifier configuration (f906484)
- chore(package): remove preinstall script for bun restriction (304aaae)
- feat: add detailed logger (aa93903)
- chore(README): add made with... block (f839a18)
- ci(npm): upgrade claude agent SDK (fb7a497)