Skip to content

Latest commit

 

History

History
1080 lines (805 loc) · 56.2 KB

File metadata and controls

1080 lines (805 loc) · 56.2 KB

Changelog

All notable changes to this project will be documented in this file.

[Unreleased]

Evidence Portability

  • Selected Pydantic Evals as the next evidence-seam hardening candidate via P9b, but kept the scope deliberately small: one reduced case-result artifact derived from EvaluationReport.cases[], possible importer-only support only if the live recut succeeds, no raw ReportCase contract, no full EvaluationReport import, no Logfire/trace/span payloads, no Trust Basis claim, no Harness recipe, and no public receipt-family story.
  • Recut the Pydantic Evals sample around pydantic-evals==1.89.1 and one reduced case-result artifact. The new fixtures carry case_name, bounded assertion/score results, and export timestamp only; broad ReportCase fields such as raw input, expected output, model output, trace, and span data remain rejected.
  • Refreshed the Mastra ScoreEvent sample against @mastra/core 1.29.1 and @mastra/observability 1.10.2 after upstream confirmed ScoreId had shipped. The strong fixture now carries live-backed score_id_ref; the v1 importer keeps the field optional for older reduced artifacts and compatibility fixtures.

Docs

  • Added Evidence Receipts in Action, a static proof page with checked-in artifacts generated from the released Assay v3.9.1 binary and Assay Harness v0.3.2 gate/report surface. The page shows the three released receipt families, their exact Trust Basis claim IDs, and the raw diff JSON to Markdown/JUnit projection split without adding a new product surface or integration claim.
  • Added a copyable GitHub Actions proof snippet to the Evidence Receipts in Action page. The snippet verifies the checked-in proof bundles with the released Assay v3.9.1 binary, writes a small job summary, and uploads canonical/projection artifacts without adding a required workflow or new runtime semantics.

CI / Release

  • Hardened the idempotent crates.io publisher so it waits for each newly published workspace crate to become visible through the crates.io API before publishing the next dependent crate.
  • Narrowed self-hosted eBPF CI triggers so release-publish helper changes do not leave optional BPF runner jobs queued when the self-hosted runner is offline.

[3.9.1] - 2026-04-29

This patch release publishes the final public three-family evidence receipts note under an immutable Assay release tag. It does not add runtime behavior, Trust Basis claims, receipt families, schema semantics, or Harness semantics.

Release Truth

  • Versioned public note: Evidence Receipts for AI Outcomes, Runtime Decisions, and Model Inventory now points to the released Assay v3.9.1 surface and Assay Harness v0.3.2 compatibility line, while keeping the same downstream-only boundary: Promptfoo assertion component results, OpenFeature boolean EvaluationDetails outcomes, and CycloneDX machine-learning-model components are bounded receipt families, not official integrations or upstream truth claims.

[3.9.0] - 2026-04-29

This minor release turns the post-v3.8.0 consolidation program into a user-facing release line. It does not add new Trust Basis claims or receipt families. Instead, it makes the existing trust compiler surface easier to gate, inspect, review, and bind to the MCP policy/tool surfaces that governed a decision.

Trust Compiler

  • Trust Basis assertions: assay trust-basis assert can now gate one canonical trust-basis.json artifact against generic --require <claim-id>=<level> predicates. The command is claim-id based, emits text or assay.trust-basis.assert.v1 JSON, exits 0 on pass, exits 1 on policy mismatch, and keeps input/config/runtime failures on 2+.
  • Receipt schema CLI: assay evidence schema list/show/validate exposes the v3.8.0 receipt schema registry as a command-line surface. It lists receipt payload and importer-input schemas, shows schema metadata before raw JSON Schema content, validates JSON or JSONL artifacts, and keeps Mastra marked as importer-only rather than a public Trust Basis claim family.
  • Static Trust Card HTML: assay trustcard generate now writes trustcard.html beside trustcard.json and trustcard.md. JSON remains the canonical Trust Card artifact; Markdown and single-file HTML are deterministic reviewer projections with no remote assets, JavaScript requirement, scores, badges, or second classifier.
  • Policy snapshot digest visibility: supported MCP assay.tool.decision events now project policy_snapshot_digest, policy_snapshot_digest_alg, policy_snapshot_canonicalization, and policy_snapshot_schema from the existing policy_digest when available. policy_snapshot_digest is the self-describing reviewer projection of policy_digest; the values match on supported paths, and the snapshot field cluster is produced atomically. This is a review binding only; it does not claim the policy is correct, sufficient, safe, approved, complete, retrievable, exportable, or embedded.
  • Tool definition digest visibility: supported MCP tools/list to tools/call decision paths can now project an atomic tool_definition_* field cluster onto assay.tool.decision events. The digest is computed over the bounded observed tool-definition surface using jcs:mcp_tool_definition.v1 and excludes x-assay-sig, top-level vendor/provider metadata, annotations, display hints, raw registry bodies, runtime results, and inferred tools/call fields. This is review visibility only; it does not claim tool safety, signature validity, signer trust, registry truth, or implementation truth.

Product Truth

  • Product surface alignment: README, docs home, scope docs, CLI about text, AI-context notes, and the P52-P56 consolidation plan now describe Assay as a CI-native evidence and trust compiler. The wording separates Assay core from Assay Harness, keeps external receipt lanes downstream-only, and avoids partnership, integration, correctness, safety, or compliance-truth claims.

[3.8.0] - 2026-04-29

This minor release turns the v3.7.0 three-family receipt surface into a more external-ready contract line. The receipt families and Trust Basis claims stay the same; the new work is machine-readable schema coverage and release-truth alignment for consumers that need to produce or inspect bounded receipts.

Receipt Contracts

  • Receipt schema registry: docs/reference/receipt-schemas/ now contains JSON Schema contracts for the supported Promptfoo, OpenFeature, CycloneDX ML-BOM, and Mastra receipt payloads plus their supported importer input artifact shapes.
  • Receipt family matrix links schemas: docs/reference/receipt-family-matrix.json now points each claim-visible family at its receipt and input schemas. Mastra remains documented as importer-only: schema-covered, bundleable, and Trust Basis-readable, but not part of the three claim-visible public families.
  • Schema validation tests: importer-generated receipt payloads and supported input artifacts are validated against the registry, keeping prose, fixtures, and emitted payloads in lockstep.

Release Truth

  • The three-family note is part of the v3.8.0 release line instead of living only as post-v3.7.0 main-branch docs.
  • Trust Card schema v5 wording is tightened around the 10-claim surface. There are no new Trust Basis claims in this release.

[3.7.0] - 2026-04-29

This minor release makes the first three-family evidence-portability surface release-ready. Assay can now reduce selected external eval outcomes, runtime decision details, and model inventory/provenance surfaces into bounded receipts, compile supported receipt families into Trust Basis, and keep the same claim-level boundary discipline as the earlier Promptfoo lane.

Trust Compiler

  • Three receipt families are claim-visible: supported eval, decision, and inventory receipt bundles can now surface bounded Trust Basis boundary claims: external_eval_receipt_boundary_visible, external_decision_receipt_boundary_visible, and external_inventory_receipt_boundary_visible. These claims mean the supported receipt boundary and provenance are visible; they do not mean upstream eval correctness, flag-decision correctness, model safety, dataset approval, BOM completeness, license posture, vulnerability posture, or compliance truth.
  • OpenFeature decision receipts: assay evidence import openfeature-details imports bounded boolean OpenFeature EvaluationDetails rows into verifiable decision receipt bundles. The v1 lane keeps provider config, evaluation context, targeting keys, rules, user identifiers, flag metadata, provider metadata, error_message, and non-boolean values out of the canonical receipt path.
  • CycloneDX ML-BOM model-component receipts: assay evidence import cyclonedx-mlbom-model imports one selected machine-learning-model component as a bounded inventory receipt. The v1 lane keeps full BOM graphs, model-card bodies, dataset bodies, pedigree, vulnerabilities, licenses, metrics, safety posture, and compliance semantics out of the receipt.
  • Mastra ScoreEvent receipts: assay evidence import mastra-score-event imports reduced, reviewer-safe Mastra ScoreEvent JSONL artifacts into score receipt bundles. This lane does not yet add a Trust Basis score claim; it is intentionally separate from the three-family public claim surface.
  • Trust Card schema v5: Trust Card output now reflects the expanded claim table. Consumers must continue to key by stable claim.id, not row position or row count.
  • Receipt family matrix: docs/reference/receipt-family-matrix.json records each supported receipt family, event type, Trust Basis claim, included fields, excluded fields, and explicit non-claims.

Examples and Docs

  • Added OpenFeature, CycloneDX ML-BOM, and Mastra ScoreEvent evidence examples plus CLI reference docs for the new importers.
  • Updated the evidence contract registry with the new experimental receipt event types.

Notes for Upgraders

  • This is a release of bounded receipt compiler lanes, not official integration or partnership support for Promptfoo, OpenFeature, CycloneDX, or Mastra.
  • Trust Basis and Trust Card consumers should treat the new claim rows as additive. Select claims by claim.id and tolerate unknown future claims.
  • Assay Harness v0.3.1 is the intended companion release for running the Promptfoo, OpenFeature, and CycloneDX recipes over this claim surface.

[3.6.0] - 2026-04-27

This minor release makes the first external-eval evidence portability lane release-ready. Assay can now import selected external evaluation outcomes as bounded evidence receipts, carry them through Trust Basis, and compare claim artifacts without importing full eval-run truth or claiming model correctness.

Trust Compiler

  • External eval outcomes as bounded receipts: Assay now has the first evidence-portability lane for selected external eval outcomes. The lane starts with Promptfoo assertion-component results, compiles them into Assay evidence receipts, carries them through Trust Basis / diff, and keeps the boundary explicit: no full eval-run import, no Promptfoo integration claim, and no model-correctness truth. See From Promptfoo JSONL to Evidence Receipts.
  • Promptfoo JSONL receipt import: assay evidence import promptfoo-jsonl imports strict Promptfoo CLI JSONL rows from gradingResult.componentResults[] and writes verifiable Assay evidence bundles. The v1 lane is deterministic-assertion-first (equals, binary 0/1 component scores) and excludes raw prompt, output, expected value, vars, provider payloads, token/cost data, and full JSONL rows.
  • Trust Basis visibility for external receipts: supported external eval receipt bundles can now surface the bounded external_eval_receipt_boundary_visible claim. The claim means the receipt boundary and provenance are visible; it does not mean the upstream eval run passed or that Assay imports upstream payloads as truth.
  • Trust Basis diff contract: assay trust-basis diff compares canonical Trust Basis artifacts by stable claim identity, reports added / removed / improved / regressed / metadata-only changes, and can fail CI only on claim-presence or claim-level regressions.

Examples and Notes

  • Promptfoo evidence sample and recipe path: the Promptfoo assertion grading-result sample is restored on main, and the Assay-side note explains the evidence portability boundary without positioning this as a Promptfoo integration or partnership.
  • Additional bounded evidence examples: OpenFeature EvaluationDetails and Guardrails validation-outcome lanes document adjacent evidence units while staying clear of provider-config truth, corrected-output truth, and full run history.

Notes for Upgraders

  • Trust Basis and Trust Card consumers should keep selecting claims by stable claim.id, not row position or row count. The external-eval receipt claim is additive.
  • The Promptfoo lane is downstream evidence portability over existing JSONL/assertion surfaces. It is not official Promptfoo support, not a partnership claim, and not a full Promptfoo export importer.

[3.5.1] - 2026-04-06

This patch release keeps the v3.5.0 trust-compiler surface intact, but makes the new MCP Registry publication path honest and publishable. It is the first Assay release line that can ship a real assay-mcp-server-<version>-linux.mcpb asset plus generated official-registry metadata from the same release asset set.

Release Tooling

  • Official MCP Registry publication foundation: Release builds now package Linux assay-mcp-server archives into a real assay-mcp-server-<version>-linux.mcpb bundle and generate server.json from the released MCPB asset URL plus SHA-256. This replaces the old hand-maintained metadata story with a bounded, supported mcpb publication path for the official MCP Registry.

Examples

  • CrewAI event evidence sample: Assay now ships a small sample-first examples/crewai-event-evidence/ flow that exports bounded CrewAI runtime events to NDJSON and maps them into Assay-shaped placeholder evidence without promoting CrewAI runtime semantics into Assay truth.

[3.5.0] - 2026-03-30

This release makes the first bounded MCP authorization-discovery seam public. K2-A Phase 1 now ships in the public Assay line as visibility-only evidence for typed MCP auth-discovery surfaces, without broadening into an auth-discovery pack, auth-success claims, or compliance theater.

Trust Compiler

  • K2-A Phase 1: Assay now publicly ships the first bounded MCP authorization-discovery seam on imported MCP traces via episode_start.meta.mcp.authorization_discovery. The slice is visibility-only, promotes positively only from typed runtime-observed WWW-Authenticate discovery on supported 401 transport paths, and explicitly does not imply auth success, scope adequacy, issuer trust, or compliance.

[3.4.0] - 2026-03-28

This patch release makes the post-v3.3.0 trust-compiler line public: G4-A Phase 1 (payload.discovery), built-in P2c (a2a-discovery-card-followup), and K1-A Phase 1 (payload.handoff) now ship in the released binaries and Python wheels. It also refreshes outward-facing package/release communication so the published line matches the actual shipped surface.

Trust Compiler

  • G4-A Phase 1: The A2A adapter now publicly ships the bounded top-level payload.discovery seam for discovery / Agent Card visibility on canonical adapter evidence. This remains adapter-emitted, visibility-only evidence with explicit non-goals around validity, trust, or verification semantics. See PLAN-G4 and G4-A freeze.
  • P2c A2A discovery/card follow-up pack (a2a-discovery-card-followup): Built-in A2A-DC-001 / A2A-DC-002 now ship publicly. The pack mirrors packs/open/a2a-discovery-card-followup/, uses json_path_exists.value_equals for boolean true, and keeps the G4-A / P2c floor semantics (requires.assay_min_version: ">=3.3.0") without a new engine bump. See MIGRATION — P2c pack and PLAN-P2c.
  • K1-A Phase 1: assay-adapter-a2a now publicly emits a bounded top-level payload.handoff object on canonical A2A adapter evidence. The seam is always present, promotes positively only for typed assay.adapter.a2a.task.requested packets with task.kind == "delegation", and explicitly does not promote from task.updated, artifact.shared, generic-message fallback, or synthetic unknown-task. No new pack, engine bump, Trust Basis change, or Trust Card change ships in this slice. See PLAN-K1 and K1-A freeze.

Python SDK

  • assay-it outward-facing metadata: The Python package now ships with a package-level README and bounded public metadata that matches the actual surface: AssayClient, Coverage, Explainer, and the pytest fixture. The published package description no longer implies the full Assay CLI or broader trust-compiler surfaces.

Release Tooling

  • Release notes template truth sync: GitHub release notes now use the canonical install URL https://getassay.dev/install.sh and the canonical action slug Rul1an/assay-action@v2, avoiding stale release-copy drift on future tags.

[3.3.0] - 2026-03-24

This release completes the first trust-compiler product line on a single public baseline: canonical Trust Basis, Trust Card schema 2 with seven claims (key by stable claim.id), G3 authorization-context evidence, pack engine 1.2, built-in mcp-signal-followup and a2a-signal-followup, migration SSOT, and kernel/pack alignment tests. See MIGRATION-TRUST-COMPILER-3.2.md, PLAN-P2a, PLAN-P2b, and RELEASE-PLAN-TRUST-COMPILER-3.3.md. Pack requires.assay_min_version: ">=3.2.3" remains the evidence-substrate floor; v3.3.0 is the first release embedding both built-in companion packs in release binaries.

Trust Compiler

  • P2b A2A companion pack (a2a-signal-followup): Built-in pack with three presence-only rules on canonical adapter evidence — A2A-001 (assay.adapter.a2a.agent.capabilities), A2A-002 (assay.adapter.a2a.task.*), A2A-003 (assay.adapter.a2a.artifact.shared). Uses existing pack checks (event_type_exists); no new engine version. Open mirror under packs/open/a2a-signal-followup/. Pack YAML sets requires.assay_min_version: ">=3.2.3" (evidence-substrate floor per MIGRATION-TRUST-COMPILER-3.2.md, same discipline as PLAN-P2a). v3.3.0 is the first Assay release with this pack built in. See PLAN-P2b.
  • H1 — Trust kernel alignment & release hardening: Single migration SSOT (MIGRATION-TRUST-COMPILER-3.2.md), PLAN-H1, integration tests for Trust Basis ↔ MCP-001 lockstep and Trust Basis ↔ Trust Card invariants (no new semantics).
  • P2a MCP companion pack (mcp-signal-followup): Built-in pack with three rules — MCP-001 uses pack check g3_authorization_context_present (engine v1.2), sharing the same predicate as Trust Basis authorization_context_visible (verified); MCP-002 / MCP-003 cover delegation (delegated_from) and containment degradation (assay.sandbox.degraded). Open mirror under packs/open/mcp-signal-followup/. assay_min_version: >=3.2.3 tracks the prerequisite line (G3 + Trust Card schema 2; v3.2.3 is the reference tag for that substrate, not for built-in pack presence). v3.3.0 is the first Assay release with this pack built in — see PLAN-P2a.
  • Pack engine v1.2: Adds g3_authorization_context_present; bumps ENGINE_VERSION in assay-evidence (mandate-baseline rules that declared engine_min_version: "1.2" now execute with this engine).
  • T1a Trust Basis Compiler MVP: Assay now ships a canonical trust-basis.json compiler surface on main, derived from verified bundles with fixed claim keys, fixed evidence vocabularies, and deterministic regeneration.
  • Low-level trust compiler CLI: Repository builds now expose assay trust-basis generate <bundle> for advanced CI, diffing, and review workflows.
  • G3 Authorization Context Evidence: Supported MCP tool-call paths can merge policy-projected auth_scheme, auth_issuer, and principal onto assay.tool.decision evidence; normalization allowlists schemes, trims issuer, rejects JWS-compact and Bearer credential material, and omits whitespace-only principals.
  • Trust Card schema v2: Trust Basis emits seven claims (adds authorization_context_visible between delegation and containment); trustcard.json uses schema_version 2. Downstream consumers should select claims by stable id, not assume a fixed row count.

Notes

  • Claim-first boundary: T1a ships claim classification in the compiler layer, not in a Trust Card renderer.
  • Deliberate non-goals: This wave does not yet ship trustcard.json, trustcard.md, a trust score, a safe/unsafe badge, or new signal/pack/engine semantics.

MCP Security

  • New MCP integrity metrics: Added tool_description_integrity, tool_output_valid, and tool_collision_detect to cover tool-definition drift, output-schema contracts, and cross-server tool shadowing.

Observability

  • Runtime monitor output: assay monitor blocked-file events now print structured dev, ino, cgroup, and rule_id fields instead of raw payload text.
  • Ring buffer pressure summary: assay monitor now reports emitted and dropped ring-buffer counters for tracepoint, LSM, and socket monitor paths at the end of a run.
  • Metric evaluation spans: The runner now emits one assay.eval.metric span per metric evaluation with stable fields for latency, cached status, pass/fail, unstable state, and error reporting.

Supply Chain

  • CycloneDX release asset: Release builds now publish assay-${VERSION}-sbom-cyclonedx.tar.gz and assay-${VERSION}-sbom-cyclonedx.tar.gz.sha256 alongside the existing binaries.

[v3.2.2] - 2026-03-17

Fixes

  • crates.io publish: Exclude assay-adapter-api from publish list (Trusted Publishing not configured). Use 3.1.0 from crates.io.
  • crates.io publish: Broaden grep pattern for token-not-valid skip.

[v3.2.1] - 2026-03-17

Fixes

  • Windows build: Gate std::os::unix::fs::PermissionsExt with #[cfg(unix)] so the Windows release build succeeds.

[v3.2.0] - 2026-03-17

Release

  • Cross-platform builds re-enabled: macOS x86_64, macOS aarch64 (Apple Silicon), and Windows x86_64 are back in the release matrix.
  • Runner updates (March 2026): macos-15 (was macos-14), windows-2025 (explicit version).
  • Install script: curl -fsSL https://getassay.dev/install.sh | sh now supports macOS ARM.

[v3.1.0] - 2026-03-15

MCP Policy Enforcement (Wave24–Wave42)

  • Typed decisions + Decision Event v2: Deterministic typed decision outcomes with structured DecisionData payloads replacing stringly-typed fields.
  • Obligation execution: Runtime execution of log, alert, approval_required, restrict_scope, and redact_args obligations with deterministic evidence emission.
  • Approval enforcement: approval_required blocks tool calls without valid approval artifacts; approval shape is additive evidence.
  • Restrict scope enforcement: restrict_scope narrows tool-call arguments at runtime with evidence of what was restricted and why.
  • Redact args enforcement: redact_args strips sensitive fields from tool-call arguments before forwarding, with redaction evidence markers.
  • Fulfillment normalization: Obligation fulfillment outcomes are normalized into a stable contract for downstream consumers.
  • Deny/fail-closed evidence convergence: Deny paths and fail-closed decisions emit consistent, typed evidence with deterministic precedence.
  • Replay diff basis: Deterministic replay diff buckets with legacy fallback classification for backward compatibility.
  • Evidence compatibility normalization: Replay evidence compatibility markers for additive reader contracts.
  • Consumer hardening: Frozen consumer read precedence for DecisionEvent, DecisionData, and ReplayDiffBasis payloads.
  • Context envelope hardening: Completeness markers and additive metadata on context-envelope payloads.

BYOS Evidence Store (ADR-015 Phase 1)

  • assay evidence store-status: New diagnostic command — checks connectivity, credentials, inventory, and write access. Supports JSON, table, and plain output. Exit codes: 0 (OK), 1 (connectivity/access failure), 2 (config error).
  • .assay/store.yaml config: Structured YAML configuration for evidence store connection. Precedence: --store > ASSAY_STORE_URL > config file. Credentials stay in environment variables.
  • Config fallback for push/pull/list: --store is now optional — falls back to ASSAY_STORE_URL or .assay/store.yaml automatically.
  • Provider quickstart docs: AWS S3, Backblaze B2, MinIO setup guides.

Architecture & Documentation

  • Architecture-as-code workspace: Structurizr/C4, building blocks, quality scenarios, Obsidian view layer, catalog metadata.
  • ADR-027 through ADR-031 closed as implemented contracts.
  • Repo-wide architecture gap analysis and roadmap truth sync.
  • Release/changelog hygiene: consolidated to single curated CHANGELOG.md.

Fixes

  • Evidence command dispatch is now async (fixes nested tokio runtime panic for BYOS commands).
  • StoreConfig::discover() returns errors on malformed config files instead of silently ignoring them.

[v3.0.0] - 2026-03-05

Breaking API Changes

  • assay_core::mcp::policy::ToolPolicy adds allow_classes and deny_classes.
  • assay_core::mcp::decision::DecisionData adds tool_classes, matched_tool_classes, match_basis, and matched_rule.
  • External struct-literal construction against these types now requires populating the new fields.

DX and Runtime

  • Coverage v1.1 polish: assay coverage supports --out-md for reviewer-friendly markdown output and --routes-top for route summary control while JSON remains canonical (coverage_report_v1).
  • MCP coverage/session exports: assay mcp wrap supports --coverage-out and --state-window-out informational artifacts with stable schemas and explicit write logging.
  • Tool taxonomy governance: MCP policy evaluation and decision metadata include tool taxonomy class matching (tool_classes, matched_tool_classes) for broader sink/source governance coverage.

Governance Contracts and Runbooks

  • Added/finalized ADR contract line for taxonomy, coverage, session/state window, and coverage DX polish (ADR-027/028/029/030/031).
  • Added operational runbooks for taxonomy+coverage and session/state export usage in enterprise workflows.

[v2.12.0] - 2026-01-29

🔐 Pack Registry: Enterprise-Grade Supply Chain Security

This release introduces the Pack Registry Client (assay-registry crate) - a complete implementation of SPEC-Pack-Registry-v1.0.3 for secure remote pack distribution.

✨ Major Features

  • Pack Registry Client (crates/assay-registry/):

    • HTTP client with token + OIDC authentication
    • Pack resolution: local → bundled → registry → BYOS
    • Local caching with TOCTOU protection (integrity verified on every read)
    • Lockfile v2 for reproducible builds (assay.packs.lock)
  • JCS Canonicalization (RFC 8785):

    • Deterministic JSON serialization for pack digests
    • Uses serde_jcs::to_vec() (bytes, not string) to eliminate encoding issues
    • Canonical digest format: sha256:{hex}
  • Strict YAML Validation (SPEC §6.1):

    • Pre-scan rejects anchors (&), aliases (*), tags (!!), multi-document (---)
    • Duplicate key detection with correct list-item scoping
    • DoS limits: max depth 50, keys 10k, string 1MB, input 10MB
    • Integer range checks: ±2^53 (IEEE 754 safe integer)
  • DSSE Signature Verification:

    • Ed25519 + PAE encoding per DSSE spec
    • Sidecar endpoint (GET /packs/{name}/{version}.sig) for large signatures
    • Client always prefers sidecar over X-Pack-Signature header
  • Trust Model (No-TOFU):

    • Pinned root keys compiled into binary
    • Key rotation via signed manifest
    • Pinned roots survive remote revocation attempts
    • Runtime expiry checks for manifest keys

🧪 GitHub Action v2.1 Test Coverage

  • Contract tests for all v2.1 features:
    • Pack lint with eu-ai-act-baseline + SARIF validation
    • Fork PR SARIF skip logic
    • OIDC provider auto-detection (AWS/GCP/Azure patterns)
    • Attestation gating (push-only, default branch, verified)
    • Coverage calculation formula

🐛 Security Fixes (P0)

  • Duplicate Key Detection: Pre-scan catches block mapping duplicates; serde_yaml catches flow mapping duplicates
  • DSSE Verification: Signature verification uses canonical JCS bytes (not raw YAML)
  • List-Item Scoping: Each list item gets its own scope (fixes false positives for - a: 1\n- a: 2)

📦 New Crate Published

📚 Documentation

  • docs/architecture/SPEC-Pack-Registry-v1.md updated to v1.0.3
  • docs/architecture/ADR-018-GitHub-Action-v2.1.md - Action v2.1 design
  • docs/architecture/SPEC-GitHub-Action-v2.1.md - Action v2.1 specification
  • Security review documentation in crates/assay-registry/docs/

Test Coverage

  • 185 tests in assay-registry crate
  • Golden vectors for JCS digest verification
  • DSSE real signature verification tests
  • Trust rotation and revocation tests
  • Cache tamper detection tests
  • Protocol edge cases (304/410/429)

[v2.10.0] - 2026-01-28

🎯 Pack Engine: Compliance Rule Packs

This release introduces the Pack Engine - a YAML-driven compliance/security/quality rule system for evidence bundle linting, with the first built-in pack for EU AI Act Article 12.

✨ Major Features

  • Pack Engine (crates/assay-evidence/src/lint/packs/):

    • YAML-defined rule packs with typed checks
    • Check types: event_count, event_pairs, event_field_present, event_type_exists, manifest_field
    • JSON Pointer (RFC 6901) for field addressing
    • JCS canonicalization (RFC 8785) for deterministic pack digests
    • Collision policy: compliance packs hard-fail, security/quality last-wins
  • EU AI Act Baseline Pack (packs/eu-ai-act-baseline.yaml):

    • EU12-001: Event recording (Article 12(1))
    • EU12-002: Operation monitoring - started/finished pairs (Article 12(2)(c))
    • EU12-003: Post-market monitoring - correlation IDs (Article 12(2)(b))
    • EU12-004: Risk identification - policy/denial fields (Article 12(2)(a))
  • CLI Integration:

    • --pack: Comma-separated pack references (built-in or file path)
    • --max-results: Limit findings for GitHub SARIF size limits (default: 500)
  • GitHub Code Scanning Compatible SARIF:

    • locations[] on all results (including global findings)
    • primaryLocationLineHash for GitHub deduplication
    • Pack metadata in tool.driver.properties.assayPacks[]
    • run.properties.disclaimer for compliance packs
    • Truncation policy with run.properties.truncated/truncatedCount

📚 Documentation

  • docs/architecture/SPEC-Pack-Engine-v1.md - Complete implementation spec
  • docs/architecture/ADR-013-EU-AI-Act-Pack.md - EU AI Act pack design
  • docs/architecture/ADR-016-Pack-Taxonomy.md - Pack taxonomy and open core model

Usage

# Run EU AI Act baseline checks
assay evidence lint bundle.tar.gz --pack eu-ai-act-baseline

# SARIF output for GitHub Code Scanning
assay evidence lint bundle.tar.gz --pack eu-ai-act-baseline --format sarif

# Custom pack file
assay evidence lint bundle.tar.gz --pack ./my-pack.yaml

[v2.4.0] - 2026-01-26

🛡️ Phase 5: SOTA Sandbox Hardening

This release delivers State-of-the-Art sandbox hardening, addressing MCP security guidance for credential isolation, honest capability reporting, and fork-safe enforcement.

✨ Major Features

  • Environment Scrubbing (env_filter.rs):
    • Default-deny for secrets (*_TOKEN, *_KEY, *_SECRET, AWS_*, GITHUB_*)
    • CLI flags: --env-allow=VAR=value, --env-passthrough=VAR
    • Always sets TMPDIR to scoped sandbox directory
  • Landlock Deny-wins Correctness (landlock_check.rs):
    • Detects "deny inside allow" conflicts that Landlock cannot enforce
    • Automatic degradation to Audit mode with explicit warning
    • Prevents false sense of security from unenforceable policies
  • Fork-Safe pre_exec:
    • Eliminated heap allocations in pre_exec closure
    • Uses std::io::Error::from_raw_os_error() instead of anyhow::bail!()
    • Syscall-only in critical fork-exec window
  • Scoped /tmp Isolation:
    • UID-based (not $USER env which can be spoofed)
    • Per-run isolation via PID in path
    • 0700 permissions (owner-only)
    • Prefers XDG_RUNTIME_DIR when available
  • Doctor Deep Dive v2:
    • Reports Phase 5 hardening feature status
    • Reads actual Landlock ABI version from sysfs
    • Net enforcement correctly reports ABI >= 4 requirement

🛠️ CI Improvements

  • scripts/ci/phase5-check.sh: New quality gate script
    • CARGO_TARGET_DIR=/tmp/assay-target for VM mount compatibility
    • --locked on all cargo commands
    • Strict Clippy -D warnings

🐛 Fixes

  • Fixed unused_assignments warning on macOS via #[cfg(target_os = "linux")]
  • Fixed io_other_error Clippy lint (Rust 1.93)
  • Added #[allow(dead_code)] for non-Linux Landlock stubs

[v2.2.0] - 2026-01-23

🛡️ SOTA Hardening (Jan 2026)

This release delivers "State-of-the-Art" infrastructure hardening, specifically targeting ARM/Self-Hosted stability and CI reliability. It eliminates supply chain risks and ensures deterministic builds across all platforms.

✨ Major Features

  • Robust ARM Infrastructure: Implemented a "GoFoss -> Ubuntu Ports" failover loop for all ARM runners. This eliminates flaky 404 errors caused by the unstable ports.ubuntu.com mirror.
    • Generic Logic: The failover script aggressively rewrites any ubuntu-ports source, scrubbing legacy/broken mirrors (e.g. edge.kernel.org) from self-hosted runners.
    • Optimization: Automatically skips logic on AMD64 runners (ubuntu-latest) to preserve "Fast Path" performance.
  • Intelligent Gating:
    • Fork Safety: Self-hosted runners are now strictly gated (if: fork == false) to prevent malicious code execution from PR forks.
    • Split Smoke: ebpf-smoke is split into -ubuntu (for signal) and -self-hosted (for depth), ensuring forks still get CI feedback.
  • Performance "Fast Path":
    • Install-First: All apt jobs now attempt install before update, leveraging fresh runner caches for significant speedups.
    • Hardened Flags: Ubiquitous use of DEBIAN_FRONTEND=noninteractive and --no-install-recommends.

🐛 Fixes

  • Artifact Sequencing: Fixed a race condition in kernel-matrix.yml (matrix-test) where install scripts ran before artifact download.
  • Supply Chain: Enforced --locked / pinned versions for all bpf-linker installations.
  • Cleanup: Removed legacy actions/cache usage for apt-lists (native disk caching is superior on self-hosted).

[v2.1.1] - 2026-01-15

🛡️ LSM Hardening & Safety

Critical release hardening the BPF-LSM implementation for production readiness.

  • Verifier Fix: Resolved BPF verifier rejection (exit code 40) by optimizing emit_event (removed zeroing loop).
  • RingBuf Safety: Implemented secure, full-buffer copy to prevent uninitialized memory leakage to userspace.
  • Explicit Deny: Validated E2E action: "deny" enforcement (EPERM blocking).
  • CI Gate: Hardened verify_lsm_docker.sh to enforce hard failures on blocking misses.

[v2.0.0] - 2026-01-12

🛡️ SOTA Hardening (Phase 5)

This major release delivers the State-of-the-Art (SOTA) architecture for robust runtime security, transitioning from "Best Effort" to "Forensically Sound" monitoring.

✨ Major Features

  • Cgroup-First Architecture: assay-monitor and assay-ebpf now prioritize cgroup membership over PID tracking, using bpf_get_current_ancestor_cgroup_id to prevent nested cgroup escapes. This ensures 100% coverage of short-lived processes.
  • Forensic Incident Bundles:
    • Secure Atomic Writes: Implementation of IncidentBuilder using openat, O_NOFOLLOW, O_EXCL, and renameat to prevent TOCTOU vulnerabilities.
    • Unique Identity: Incident files now use UUID v4 suffixes to guarantee uniqueness.
    • Detailed Metadata: Includes kernel version, session UUID, and process tree context.
  • eBPF Hardening:
    • Dynamic Offsets: Removed all hardcoded kernel offsets in favor of runtime resolution via /sys/kernel/tracing/events/.../format.
    • Extended Coverage: Added sys_enter_openat2 probe for modern kernels (Linux 5.6+).
    • Safety: Uses read_user_str_bytes with explicit bounds checking safe slices.

🐛 Fixes & Polish

  • CI Reliability: Complete overhaul of CI pipelines using sccache (local backend), mold linker (Linux), and single-pass testing. Zero 400 errors from GH Actions Cache.
  • Windows Support: Fixed compilation issues in assay-cli by guarding Unix-specific cgroup logic.
  • Golden Tests: Resolved output mismatches for strict reproducibility.

[v1.8.0] - 2026-01-11

🚀 Runtime Features (System 2 Security)

This release transforms Assay from a static analyzer into a complete Runtime Security Platform. It introduces the "System 2" capabilities: detecting and stopping dangerous behavior as it happens.

✨ Major Features

  • Runtime Monitor (assay monitor) (Linux Only):
    • Uses eBPF (extended Berkeley Packet Filter) to trace process behavior safely in kernel space.
    • Detects file access (openat) and network connections (connect) in real-time.
    • Zero-Overhead: Highly optimized "Read-First" ring buffer implementation.
  • Discovery (assay discover):
    • Automatically inventory running MCP servers and local configurations.
    • Detects unmanaged servers and security gaps.
  • Kill Switch (assay kill):
    • Emergency termination of rogue agent processes.
    • Supports graceful shutdown (SIGTERM) and immediate kill (SIGKILL).

🛡️ Hardening

  • Native eBPF Builds: CI now builds eBPF artifacts natively (no Docker required), ensuring determinism and stability.
  • Host Build Protection: The assay-ebpf crate is feature-gated to prevent accidental linking on non-Linux hosts.
  • Strict Dependencies: All upstream dependencies are strictly pinned for reproducibility.

📚 Documentation

  • Unified Reference: Consolidated runtime documentation into docs/runtime-monitor.md.
  • Handoff: Comprehensive architecture & maintenance guide available for contributors.

[v1.7.0] - 2026-01-09

🛡️ Strict Deprecation Mode

  • Refined Deprecations: Formal deprecation of v1.x constraints syntax.
  • Strict Mode: New --deny-deprecations flag (and ASSAY_STRICT_DEPRECATIONS=1 env var) to enforce strict compliance in CI.
  • Migration Guide: New detailed v1-to-v2 Migration Guide.
  • Startup Warnings: Server/Proxy now emit clear warnings when loading legacy policies.

Added

  • CLI: assay policy validate --deny-deprecations (and for run/wrap modes).
  • Docs: Comprehensive docs/migration/v1-to-v2.md.

[v1.6.0] - 2026-01-09

Added

  • Policy v2.0 (JSON Schema): Official support for JSON Schema constraints (schemas:) replacing regex loops.
  • Unified Policy Engine: assay-core, assay-cli, and assay-mcp-server now share the exact same evaluation logic (McpPolicy::evaluate).
  • New Commands: assay policy validate, migrate, and fmt.
  • Enforcement Modes: enforcement.unconstrained_tools: warn|deny|allow for finer control over headless/legacy tools.
  • Scoped Refs: $ref support within single policy documents (#/schemas/$defs/...).

Changed

  • Runtime Consistency: assay mcp wrap (proxy) and assay-mcp-server enforce the exact same rules as assay coverage.
  • Auto-Migration: Legacy v1 policies (constraints:) are auto-migrated in-memory with deprecation warnings.

Deprecated

  • v1 Constraints: The constraints: syntax is deprecated and will be removed in Assay v2.0.0. Use assay policy migrate to upgrade.

Fixed

  • JSON Casing: Stabilized structuredContent vs structured_content in error contracts.
  • Symlink Resolution: Fixed policy resolution issues on macOS /tmp.

🛠️ Autofix & Policy Packs

A major productivity release introducing automated self-repair (assay fix) and instant policy scaffolding (assay init --pack).

✨ Major Features

  • assay fix: Interactively repair configuration issues.
    • Automated Patches: Fixes config errors, schema violations, and missing policies based on diagnostics.
    • Dry Run: Preview changes before applying them.
    • Atomic Writes: Cross-platform safe file updates (Windows/Linux/macOS).
  • Policy Packs (assay init --pack):
    • default: Balanced security (blocks RCE, audits sensitive ops).
    • hardened: Maximum security (allowlist-only, strict args).
    • dev: Permissive for rapid prototyping (logs warnings).

🛡️ Hardening

  • Patch Engine: Strict traversal prevents partial mutations during remove/replace operations.
  • Module Cleanup: Extracted shared logic to assay-cli::util for better maintainability.
  • Windows Support: Robust atomic file replacement strategy.

[v1.4.1] - 2026-01-06

🩹 Consistency & SARIF Polish

Post-release hardening for Agentic Contract and SARIF compliance.

🛠️ Fixes

  • Contract Consistency: Internal severity normalization (warning -> warn) now applied strictly to exit code logic and CLI text output logic.
  • SARIF: invocations.exitCode now accurately reflects the CLI exit code (0/1/2).
  • Contract: Text output summary counts now strictly match JSON output counts.

[v1.4.0] - 2026-01-06

🛡️ Agentic Security Edition

The "CI Gate" release. This major update transforms Assay into a comprehensive CI/CD guardrail for Agentic systems.

✨ Major Features

  • assay init: Interactive wizard that auto-detects your project type (Python/Node/MCP) and generates secure policy + CI config in < 5s.
  • assay validate: Dedicated CI command with strict exit codes (0=Pass, 1=Fail, 2=Error) and zero overhead.
  • Agentic Contract: --format json output is now strictly typed, stable, and designed for AI self-correction loops.
  • GitHub Advanced Security: --format sarif support for direct integration with GitHub Code Scanning.

📚 Documentation

  • Overhaul: Complete rewrite of Quickstart, CLI Reference, and Architecture guides.
  • GetAssay.dev: One-line install script and landing page sync.

[v1.3.0] - 2026-01-06

✨ New Feature: assay mcp config-path

Simplified 1-step setup for Claude Desktop, Cursor, and other MCP clients.

  • Auto-detection: Automatically finds config files on macOS, Windows, and Linux.
  • Generation: Generates secure JSON snippets for your mcpServers config.
  • Security: Enforces policy file usage by default.

🛡️ Security Hardening

  • Fail-Secure: CLI now fatal-errors if specified policy file is missing (no insecure fallbacks).
  • Policy: clarifications on rate limit fields.
  • Proxy: Improved logging for unknown tool calls.

🐛 CI Fixes

  • Python Wheels: Fixed extensive artifact corruption issue in Release workflow (release.yml).
  • Linting: Strict clippy and rustfmt compliance across the board.

[v1.2.12] - 2026-01-05

🩹 Fix

  • README: Fixed broken CI status badge (pointed to non-existent assay.yml).

[v1.2.11] - 2026-01-05

📖 Docs Pages Update

  • Index: Aligned landing page with new "Vibecoder + Senior" positioning.
  • User Guide: Rewritten to focus on CI/CD, Doctor, and Python workflows (removed legacy RAG metrics noise).
  • Consistency: Unified messaging across README and documentation site.

[v1.2.10] - 2026-01-05

📖 Documentation Refresh

  • README: Overhauled for "Vibecoder + Senior" audience.
  • Guides: Updated Python Quickstart and Identity docs.
  • Consistency: assay-it is now the canonical package name in docs.

[v1.2.9] - 2026-01-05

🧹 Code Sweep

  • Removed redundant directories (test-*/, assay-doctor-*).
  • Refactored doctor module to remove verbose comments.
  • Zero fluff policy applied.

[v1.2.8] - 2026-01-05

📚 SOTA DX Features

  • Python Docs: Added comprehensive docstrings to assay.Coverage, assay.validate, and AssayClient wrappers. IDEs will now show rich tooltips. (Google-style)
  • Stability: Added CLI verification tests for assay init-ci.

[v1.2.7] - 2026-01-05

🩹 Formatting Fix

Patch release to verify cargo fmt compliance after v1.2.6 refactoring.

[v1.2.6] - 2026-01-05

🩹 Clippy Fix

Patch release to fix a stable-clippy lint regex_creation_in_loops.

  • Performance: Regex is now compiled once per doctor suite, not per policy.

[v1.2.5] - 2026-01-05

📦 PyPI Metadata Fix (Real)

Updated pyproject.toml to explicitly use assay-it as the package name, ensuring maturin builds the correct wheel metadata for PyPI.

  • Distribution Name: assay-it (Final Fix)

[v1.2.4] - 2026-01-05

📦 PyPI Package Rename

Renamed the Python SDK distribution package to assay-it to match the PyPI project name.

  • Distribution Name: assay-it (PyPI)
  • Import Name: import assay (Unchanged)

[v1.2.3] - 2026-01-05

🩹 CI Stabilization

Patch release to resolve build pipeline issues.

  • Fix: Resolved artifact corruption in wheel generation (PyPI Release).
  • Fix: Corrected formatting in doctor/mod.rs to pass strict CI linting.

[v1.2.2] - 2026-01-05

💅 Polish & Fixes

Strictness doesn't have to be unfriendly. This release polishes the "Strict Schema" experience.

  • Friendly Hints: When unknown fields are detected (e.g. require_args), Doctor now suggests the closest valid field ("Did you mean require_args?").
  • Output: assay doctor now correctly displays diagnostic messages in human-readable output (previously they were counted but hidden).
  • Release Fix: Removed legacy workflows to ensure smooth PyPI publishing.

[v1.2.1-ext] - 2026-01-05

🩺 Smart Doctor (SOTA Agentic Edition)

Transformed assay doctor into a "System 2" diagnostic engine for Agentic workflows.

  • Analyzers:
    • Trace Drift: Detects legacy function_call usage (recommends tool_calls).
    • Integrity: Validates existence of all referenced policy/config files.
    • Logic: Detects alias shadowing (e.g. Search alias hiding Search tool).
  • Agentic Contract:
    • Output via --format json is strict, machine-readable, and deterministic.
    • Includes fix_steps for automated self-repair.
    • Robust JSON Errors: Even config parsing failures return valid JSON envelopes (when requested), ensuring Agents never crash on plain text errors.

⚠️ Breaking Changes (Strict Schema)

To prevent "Silent Failures" (phantom configs), we now enforce Strict Schema Validation:

  • Unknown fields in assay.yaml or policy.yaml now cause a HARD ERROR.
  • Previously, typos or incorrect nesting (e.g. tools: ToolName:) were silently ignored. Now you will see E_CFG_PARSE with "unknown field".
  • Why: Required for reliable Agentic generation and debugging.

🐛 Fixes

  • Demo: assay demo now generates canonical, schema-compliant policies.
  • DX: Restored request_id uniqueness check in trace client.

[v1.2.0] - 2026-01-04

🐍 Python SDK (assay-python-sdk)

Native Python bindings for seamless integration into Pytest and other Python workflows.

  • AssayClient: Record traces directly from python code using client.record_trace(obj).
  • Coverage: Analyze trace coverage with assay.Coverage(policy_path).analyze(traces).
  • Explainer: Generate human-readable explanations of tool usage vs policy.
  • Performance: Built on PyO3 + maturin for high-performance Rust bindings.

🛡️ Coverage Thresholds & Gates (assay coverage)

New assay coverage command to enforce quality gates in CI.

  • Min Coverage: Fail build if coverage drops below threshold (--min-coverage 80).
  • Baseline Regressions: Compare against a baseline and fail on regression (--baseline base.json).
  • High Risk Gaps: Detect and fail if critical deny-listed tools are never exercised.
  • Export: Save baselines with --export-baseline.

📉 Baseline Foundation (assay baseline)

Manage and track baselines to detect behavioral shifts.

  • assay baseline record: Capture current run metrics.
  • assay baseline check: Diff current run against stored baseline.
  • Determinism: Guaranteed deterministic output for reliable regression testing.

Added

  • assay-python-sdk package on PyPI (upcoming).
  • TraceExplainer logic exposed to Python.

[v1.1.0] - 2026-01-02

Added

Policy DSL v2 - Temporal Constraints

New sequence operators for complex agent workflow validation:

  • max_calls - Rate limiting per tool

    sequences:
      - type: max_calls
        tool: FetchURL
        max: 10  # Deny on 11th call
  • after - Post-condition enforcement

    sequences:
      - type: after
        trigger: ModifyData
        then: AuditLog
        within: 3  # AuditLog must appear within 3 calls after ModifyData
  • never_after - Forbidden sequences

    sequences:
      - type: never_after
        trigger: Logout
        forbidden: AccessData  # Once logged out, cannot access data
  • sequence - Exact ordering with strict mode

    sequences:
      - type: sequence
        tools: [Authenticate, Authorize, Execute]
        strict: true  # Must be consecutive, no intervening calls

Aliases

Define tool groups for cleaner policies:

aliases:
  Search:
    - SearchKnowledgeBase
    - SearchWeb
    - SearchDatabase

sequences:
  - type: eventually
    tool: Search  # Matches any alias member
    within: 5

Coverage Metrics

New assay coverage command for CI/CD integration:

# Check tool and rule coverage
assay coverage --policy policy.yaml --traces traces.jsonl --min-coverage 80

# Output formats: summary, json, markdown, github
assay coverage --policy policy.yaml --traces traces.jsonl --format github

Features:

  • Tool coverage: which policy tools were exercised
  • Rule coverage: which rules were triggered
  • High-risk gaps: blocklisted tools never tested
  • Unexpected tools: tools in traces but not in policy
  • Exit codes: 0 (pass), 1 (fail), 2 (error)
  • GitHub Actions annotations for PR feedback

GitHub Action

- uses: assay-dev/assay-action@v1
  with:
    policy: policies/agent.yaml
    traces: traces/
    min-coverage: 80

One-liner Installation

curl -sSL https://assay.dev/install.sh | sh

Changed

  • Policy version bumped to 1.1
  • Improved error messages with actionable hints
  • Better alias resolution performance

Experimental

The following features are available but not yet stable:

  • assay explain - Trace debugging and visualization (use --experimental flag)

Migration from v1.0

v1.1 is fully backwards compatible with v1.0 policies. To use new features:

  1. Update version: "1.0" to version: "1.1" in your policy files
  2. Add aliases section if using tool groups
  3. Add new sequence rules as needed

Existing v1.0 policies will continue to work without modification.

[v1.0.0] - 2025-12-29

Added

  • Structured Logging: assay-core now uses tracing for fail-safe events (assay.failsafe.triggered), enabling direct Datadog/OTLP integration.
  • Protocol Feedback: assay-mcp-server now includes a warning field in the response when on_error: allow is active and an error occurs, allowing clients to adapt logic.
  • Documentation: Added "Look-behind Workarounds" to docs/guides/migration-regex.md.

[v1.0.0-rc.2] - 2025-12-28

🚀 Release Candidate 2

Rapid-response release addressing critical Design Partner feedback regarding MCP protocol compliance and operational visibility.

✨ Features

  • Structured Fail-Safe Logging: Introduced assay.failsafe.triggered JSON event when on_error: allow is active, enabling machine-readable audit trails.
  • Fail-Safe UX: Logging now occurs via standard stderr to avoid polluting piping outputs.

🐛 Fixes

  • MCP Compliance: assay-mcp-server tool results are now wrapped in standard CallToolResult structure ({ content: [...], isError: bool }), enabling clients to parse error details and agents to self-correct.

🚀 Release Candidate 1

First Release Candidate for Assay v1.0.0, introducing the "One Engine, Two Modes" guarantee and unified policy enforcement.

✨ Features

  • Unified Policy Engine: Centralized validation logic (assay-core::policy_engine) shared between CLI, SDK, and MCP Server.
  • Fail-Safe Configuration: New on_error: block | allow settings for graceful degradation.
  • Parity Test Suite: New tests/parity_batch_streaming.rs ensuring identical behavior between batch and streaming modes.
  • False Positive Suite: tests/fp_suite.yaml validation for legitimate business flows.
  • Latency Benchmarks: confirmed core decision latency <0.1ms (p95).

🐛 Fixes

  • Resolved schema validation discrepancies between local CLI and MCP calls.
  • Fixed sequence_valid assertions to support regex-based policy matching.

[v0.9.0] - 2025-12-27

🚀 Hardened & Release Ready

This release marks the transition to a hardened, production-grade CLI. It introduces strict contract guarantees, robust migration checks, and full CI support.

✨ Features

  • Official CI Template: .github/workflows/assay.yml for drop-in GitHub Actions support.
  • Assay Check: New assay migrate --check command to guard against unmigrated configs in CI.
  • CLI Contract: Formalized exit codes:
    • 0: Success / Clean
    • 1: Test Failure
    • 2: Configuration / Migration Error
  • Soak Tested: Validated with >50 consecutive runs for 0-flake guarantee.
  • Strict Mode Config: configVersion: 1 removes top-level policies in favor of inline declarations.

⚠️ Breaking Changes

  • Configuration: Top-level policies field is no longer supported in configVersion: 1. You must run assay migrate to update your config.
  • Fail-Fast: assay migrate and validate now fail hard (Exit 2) on unknown standard fields.

🐛 Fixes

  • Fixed "Silent Drop" issue where unknown YAML fields were ignored during parsing.
  • Resolved argument expansion bug in test scripts on generic shells.

[v0.8.0] - 2025-12-27

Added

  • Soak test hardening for legacy configs
  • Unit tests for backward compatibility
  • EvalConfig::validate() method

Changed

  • Prepared configVersion: 1 logic (opt-in)