Release Release 1.6.0 · ali-master/audit

audit v1.6.0

✨ Highlights

Promptfoo evaluation coverage for the audit pipeline's stage prompts, plus a deterministic safety net in the Feedback stage and supporting skills/standards.

Added

Promptfoo eval suites for the pure-reasoning stages. New evals/ harness that loads each stage's shipped prompts/*.md verbatim and grades real model
output against the stage schemas — covering Dedupe (05), Report (08), Gapfill (04), and Feedback (07). Each suite pairs schema validation
with behavioral assertions (root-cause clustering, reachable-only reporting, coverage-driven task generation, and sibling-targeting) for deterministic,
regression-proof grading.
Cross-file schema validation helper (evals/lib/validate-schema.cjs) that resolves relative $refs (e.g. gapfill_output/feedback_output →
hunt_task.schema.json) via ajv, which promptfoo's built-in is-json cannot do alone.
No-retest floor in the Feedback stage. partitionRetests() deterministically drops any generated hunt task that re-targets an already-proven
finding.file, backstopping a semantic rule no JSON schema can express. Drops are logged per-task and counted in the stage summary — never silent — and are
covered by 6 new unit tests.
New skills/standards: promptfoo-evals skill (with cheatsheet) and redteam-plugin-development standards for authoring redteam plugins and graders.

Changed

Tightened stage prompt contracts. 04-gapfill.md, 05-dedupe.md, 07-feedback.md, and 08-report.md now pin exact output key names and re-list
every required field, eliminating schema-compliance drift (e.g. wrong root keys, dropped rationale, renamed coverage_analysis arrays) observed across
models.
07-feedback.md no longer permits bundling a proven sink file into a broader "sweep" task — it must target only new sibling locations.
Fixed a report.schema.json / trace.schema.json inconsistency that would have rejected admin-gated findings (auth_required on entry points).

Dependencies

Bumped @anthropic-ai/claude-agent-sdk to v0.3.181.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Release 1.6.0

Choose a tag to compare

Sorry, something went wrong.

Sorry, something went wrong.

Uh oh!

No results found

audit v1.6.0

Added

Changed

Dependencies

Uh oh!