Skip to content

Releases: ali-master/audit

Release 1.6.0

19 Jun 00:46
78117b0

Choose a tag to compare

audit v1.6.0

✨ Highlights

Promptfoo evaluation coverage for the audit pipeline's stage prompts, plus a deterministic safety net in the Feedback stage and supporting skills/standards.

Added

  • Promptfoo eval suites for the pure-reasoning stages. New evals/ harness that loads each stage's shipped prompts/*.md verbatim and grades real model
    output against the stage schemas — covering Dedupe (05), Report (08), Gapfill (04), and Feedback (07). Each suite pairs schema validation
    with behavioral assertions (root-cause clustering, reachable-only reporting, coverage-driven task generation, and sibling-targeting) for deterministic,
    regression-proof grading.
  • Cross-file schema validation helper (evals/lib/validate-schema.cjs) that resolves relative $refs (e.g. gapfill_output/feedback_output
    hunt_task.schema.json) via ajv, which promptfoo's built-in is-json cannot do alone.
  • No-retest floor in the Feedback stage. partitionRetests() deterministically drops any generated hunt task that re-targets an already-proven
    finding.file, backstopping a semantic rule no JSON schema can express. Drops are logged per-task and counted in the stage summary — never silent — and are
    covered by 6 new unit tests.
  • New skills/standards: promptfoo-evals skill (with cheatsheet) and redteam-plugin-development standards for authoring redteam plugins and graders.

Changed

  • Tightened stage prompt contracts. 04-gapfill.md, 05-dedupe.md, 07-feedback.md, and 08-report.md now pin exact output key names and re-list
    every required field, eliminating schema-compliance drift (e.g. wrong root keys, dropped rationale, renamed coverage_analysis arrays) observed across
    models.
  • 07-feedback.md no longer permits bundling a proven sink file into a broader "sweep" task — it must target only new sibling locations.
  • Fixed a report.schema.json / trace.schema.json inconsistency that would have rejected admin-gated findings (auth_required on entry points).

Dependencies

  • Bumped @anthropic-ai/claude-agent-sdk to v0.3.181.

Release 1.5.0

17 Jun 15:31
91f89c8

Choose a tag to compare

audit v1.5.0

Code-grounded fix guidance: every finding can now carry a remediation written from the actual code, surfaced wherever you review findings.

✨ Highlights

audit advise — remediation grounded in your code

A new read-only stage that reads the real sink and reachability call chain and writes a fix specific to this codebase — the actual function, the framework's safe API, the exact change —
plus an optional fix sketch and references. It explains the fix; the fix stage writes the patch.

audit advise --run-id my-run                  # reachable findings (default)
audit advise --run-id my-run --scope all      # every finding
audit advise --run-id my-run --finding f_ab12 # just one

--scope reachable|confirmed|all, --finding <id>, and --force (regenerate) are supported. Results are stored per finding and reused everywhere.

One recommendation, surfaced everywhere

  • report (json / md / sarif) now fills each finding's recommendation from the generated advice, so the report, the viewer, and audit advise never disagree.
  • Triage viewer — each finding's detail panel has a Solution box showing the code-grounded recommendation (with root cause, an optional fixed-code sketch, references, the
    validator's suggested test, and a CWE link). When advice hasn't been generated yet, a Generate fix button produces it on demand (report --serve --allow-api-key when you rely on
    ANTHROPIC_API_KEY).

🔄 Changed

  • The viewer's previous hard-coded, vuln-class remediation catalog has been removed in favor of the code-grounded advice above. Generic boilerplate is replaced by guidance that names
    the real code.

📝 Notes

  • advise is read-only and opt-in (a command or the viewer button) — it never edits files; use audit fix to produce patches.
  • The triage run stores its reproduced finding, so audit advise --run-id <triage-run> --finding <id> produces the same code-grounded fix for a triaged report.

Release 1.4.0

17 Jun 14:59
1f1b0db

Choose a tag to compare

audit v1.4.0

Background runs you can fire-and-track, and a redesigned triage viewer that opens on a live analytics dashboard.

✨ Highlights

Background runs & session tracking

  • audit run -d / --background — detach a scan into a background process and get your shell back immediately. Output streams to results/<run-id>/run.log; auth and path checks still
    run in the foreground so misconfiguration fails fast before detaching.
  • audit sessions (alias ps) — list every run currently in progress. Background runs show their driving PID and whether it's still alive (a running row whose process has
    died is flagged in red as crashed); foreground runs are shown too, and finished runs are pruned automatically. --json for scripting.
audit run -d --max-cost-usd 30      # start in the background, returns at once
audit sessions                      # what's active, and is it still alive?
tail -f results/<run-id>/run.log    # follow along

Redesigned triage viewer

audit report --serve is now a modern triage dashboard, still a zero-build local UI served straight from SQLite:

  • Live analytics — finding / reachable / confirmed / triaged counts, average confidence, and severity / vulnerability-class / hottest-file distribution bars that recompute as you
    filter. The severity and class bars are click-to-filter.
    filter. The severity and class bars are click-to-filter.
  • Filter, search & sort — full-text search plus severity, status, class, reachable-only, and canonical-only filters; sort any column; navigate the list with j / k or arrow keys.
  • Richer detail panel — severity, reachability, validation status, confidence and canonical flags, with the reachability call chain drawn as a connected timeline.
  • Triage verdicts (confirm / false-positive / won't-fix) still persist to the triage table, and "Export baseline" downloads a suppression baseline for the next run.

Viewer networking

  • report --serve --host <host> — bind the viewer beyond loopback (e.g. --host 0.0.0.0) when you need it reachable from another machine. It binds to 127.0.0.1 by default and warns
    when exposed, since the viewer has no authentication.

📝 Notes

  • Background runs and the sessions registry require no new flags to read — audit sessions works for foreground runs as well.
  • The viewer continues to render all finding text via DOM textContent (never innerHTML), so evidence from the audited repo can't execute inside the tool.

Release 1.3.0

17 Jun 13:56
b1aa0b1

Choose a tag to compare

audit v1.3.0

A release focused on fitting audit into real workflows — CI pipelines, pull requests, and the human triage loop — plus closing the loop after a scan with auto-fixes, a triage UI, and cost
visibility, and inbound-report triage.

✨ Highlights

CI / pull-request workflow

  • Diff-aware scanningaudit run --base <ref> (PR mode, merge-base) or --since <ref> (incremental) constrains Recon to the changed files plus their blast radius
    (callers/callees/importers). A PR scan costs cents instead of dollars.
  • Baseline & deltaaudit baseline snapshots a run's findings into a committable file; --baseline <path> then suppresses known issues and reports NEW / FIXED / STILL-PRESENT.
    Findings are matched by a line-shift-robust fingerprint (vuln_class + normalized path + sink-code hash), so reformatting and code movement don't resurface old noise.
  • SARIF + exit-code gatingaudit report --format sarif emits SARIF 2.1.0 for the GitHub Advanced Security tab, GitLab SAST, and other triage tooling. Because audit produces
    reachability traces, each result carries a codeFlows entry (entry-point → sink), richer than typical single-location SAST output, plus partialFingerprints for cross-run dedupe.
    --fail-on <severity> exits 4 when a finding crosses the threshold — and with a baseline, only new findings can trip the gate.

Close the loop after the report

  • Auto-fix (Stage 9, opt-in)audit fix generates a minimal patch and a regression test for each confirmed and reachable finding. Each fix is produced inside a throwaway git
    worktree, so your working tree is never touched; patches are saved for review. --apply lands them on a branch (clean tree required) and --open-pr opens a draft PR via gh.
    Human-in-the-loop by design — it never auto-merges.
  • Interactive triage vieweraudit report --serve opens a local web UI (127.0.0.1) reading straight from SQLite: filter by severity / reachability / status, inspect the call chain,
    and mark findings confirm / false-positive / won't-fix. Export your suppressions straight to a baseline.
  • Cost observabilityaudit stats breaks spend down by stage and model, with token usage, prompt-cache hit ratio, and cost-per-confirmed / cost-per-reachable finding — so you can
    tune stages.yaml with data instead of guesswork. --json for automation.

Bug-bounty / VDP triage

  • audit triage --report <path> — ingest an inbound researcher submission, reproduce it against the real code, run it through the same adversarial Validate stage (skeptical,
    different model) and reachability Trace gate, and dedupe it against your known issues (--against <run-id> / --baseline). Outcomes one verdict — accept / reject / duplicate /
    needs_info / not_reproduced
    — with the trace attached. The reachability gate is decisive: a confirmed-but-unreachable claim is recommended reject (real defect, but out of scope), giving
    triagers the "can't reach this, likely invalid" signal that cuts through low-quality reports. Supports stdin (--report -), JSON output, and live-target reproduction.

🛠Improvements

  • Added audit --version (-V).
  • All CLI output now flows through a single logger, including a new chrome-free log.print for tables and machine output, keeping piped JSON/SARIF artifacts clean.
  • Fixed the live progress line flooding the terminal when activity text exceeded the terminal width (it now clamps to one row).

📝 Notes

  • Auto-fix requires a git repository; --apply requires a clean working tree, and --open-pr opens a draft PR and never merges.
  • New CLI exit code: 4 = --fail-on gate tripped (distinct from 2 usage / 3 budget abort).
  • Diff mode and triage dedupe assume a git repo.

Release 1.2.0

17 Jun 13:19
fb9382e

Choose a tag to compare

  • feat(api): add auto-fix and triage viewer support (d450df9)
  • feat(cli): add baseline, diff, and SARIF support (d2d9b06)
  • feat(logger): enhance terminal rendering and timestamp handling (8b9eefe)

Release 1.1.0

17 Jun 10:55
a0deb3a

Choose a tag to compare

  • docs(cli): update CLI references and examples for audit command (385a396)
  • feat(cli): improve updateNotifier configuration (f906484)
  • chore(package): remove preinstall script for bun restriction (304aaae)
  • feat: add detailed logger (aa93903)
  • chore(README): add made with... block (f839a18)
  • ci(npm): upgrade claude agent SDK (fb7a497)

Release 1.0.0

17 Jun 10:23
3c73cab

Choose a tag to compare

  • feat: add global install support, update CLI defaults, and improve state management (d3e066a)
  • fix: update branch references from master to main (8620caa)
  • feat: init (35c3cda)