All notable changes to this project will be documented in this file.
- Selected Pydantic Evals as the next evidence-seam hardening candidate via
P9b, but kept the scope deliberately small: one reduced case-result artifact derived fromEvaluationReport.cases[], possible importer-only support only if the live recut succeeds, no rawReportCasecontract, no fullEvaluationReportimport, no Logfire/trace/span payloads, no Trust Basis claim, no Harness recipe, and no public receipt-family story. - Recut the Pydantic Evals sample around
pydantic-evals==1.89.1and one reduced case-result artifact. The new fixtures carrycase_name, bounded assertion/score results, and export timestamp only; broadReportCasefields such as raw input, expected output, model output, trace, and span data remain rejected. - Refreshed the Mastra ScoreEvent sample against
@mastra/core1.29.1and@mastra/observability1.10.2after upstream confirmedScoreIdhad shipped. The strong fixture now carries live-backedscore_id_ref; the v1 importer keeps the field optional for older reduced artifacts and compatibility fixtures.
- Added
Evidence Receipts in Action,
a static proof page with checked-in artifacts generated from the released
Assay
v3.9.1binary and Assay Harnessv0.3.2gate/report surface. The page shows the three released receipt families, their exact Trust Basis claim IDs, and the raw diff JSON to Markdown/JUnit projection split without adding a new product surface or integration claim. - Added a copyable GitHub Actions proof snippet to the Evidence Receipts in
Action page. The snippet verifies the checked-in proof bundles with the
released Assay
v3.9.1binary, writes a small job summary, and uploads canonical/projection artifacts without adding a required workflow or new runtime semantics.
- Hardened the idempotent crates.io publisher so it waits for each newly published workspace crate to become visible through the crates.io API before publishing the next dependent crate.
- Narrowed self-hosted eBPF CI triggers so release-publish helper changes do not leave optional BPF runner jobs queued when the self-hosted runner is offline.
This patch release publishes the final public three-family evidence receipts note under an immutable Assay release tag. It does not add runtime behavior, Trust Basis claims, receipt families, schema semantics, or Harness semantics.
- Versioned public note:
Evidence Receipts for AI Outcomes, Runtime Decisions, and Model Inventory
now points to the released Assay
v3.9.1surface and Assay Harnessv0.3.2compatibility line, while keeping the same downstream-only boundary: Promptfoo assertion component results, OpenFeature booleanEvaluationDetailsoutcomes, and CycloneDXmachine-learning-modelcomponents are bounded receipt families, not official integrations or upstream truth claims.
This minor release turns the post-v3.8.0 consolidation program into a user-facing release line. It does not add new Trust Basis claims or receipt families. Instead, it makes the existing trust compiler surface easier to gate, inspect, review, and bind to the MCP policy/tool surfaces that governed a decision.
- Trust Basis assertions:
assay trust-basis assertcan now gate one canonicaltrust-basis.jsonartifact against generic--require <claim-id>=<level>predicates. The command is claim-id based, emits text orassay.trust-basis.assert.v1JSON, exits0on pass, exits1on policy mismatch, and keeps input/config/runtime failures on2+. - Receipt schema CLI:
assay evidence schema list/show/validateexposes the v3.8.0 receipt schema registry as a command-line surface. It lists receipt payload and importer-input schemas, shows schema metadata before raw JSON Schema content, validates JSON or JSONL artifacts, and keeps Mastra marked as importer-only rather than a public Trust Basis claim family. - Static Trust Card HTML:
assay trustcard generatenow writestrustcard.htmlbesidetrustcard.jsonandtrustcard.md. JSON remains the canonical Trust Card artifact; Markdown and single-file HTML are deterministic reviewer projections with no remote assets, JavaScript requirement, scores, badges, or second classifier. - Policy snapshot digest visibility: supported MCP
assay.tool.decisionevents now projectpolicy_snapshot_digest,policy_snapshot_digest_alg,policy_snapshot_canonicalization, andpolicy_snapshot_schemafrom the existingpolicy_digestwhen available.policy_snapshot_digestis the self-describing reviewer projection ofpolicy_digest; the values match on supported paths, and the snapshot field cluster is produced atomically. This is a review binding only; it does not claim the policy is correct, sufficient, safe, approved, complete, retrievable, exportable, or embedded. - Tool definition digest visibility: supported MCP
tools/listtotools/calldecision paths can now project an atomictool_definition_*field cluster ontoassay.tool.decisionevents. The digest is computed over the bounded observed tool-definition surface usingjcs:mcp_tool_definition.v1and excludesx-assay-sig, top-level vendor/provider metadata, annotations, display hints, raw registry bodies, runtime results, and inferredtools/callfields. This is review visibility only; it does not claim tool safety, signature validity, signer trust, registry truth, or implementation truth.
- Product surface alignment: README, docs home, scope docs, CLI about text, AI-context notes, and the P52-P56 consolidation plan now describe Assay as a CI-native evidence and trust compiler. The wording separates Assay core from Assay Harness, keeps external receipt lanes downstream-only, and avoids partnership, integration, correctness, safety, or compliance-truth claims.
This minor release turns the v3.7.0 three-family receipt surface into a more external-ready contract line. The receipt families and Trust Basis claims stay the same; the new work is machine-readable schema coverage and release-truth alignment for consumers that need to produce or inspect bounded receipts.
- Receipt schema registry:
docs/reference/receipt-schemas/now contains JSON Schema contracts for the supported Promptfoo, OpenFeature, CycloneDX ML-BOM, and Mastra receipt payloads plus their supported importer input artifact shapes. - Receipt family matrix links schemas:
docs/reference/receipt-family-matrix.jsonnow points each claim-visible family at its receipt and input schemas. Mastra remains documented as importer-only: schema-covered, bundleable, and Trust Basis-readable, but not part of the three claim-visible public families. - Schema validation tests: importer-generated receipt payloads and supported input artifacts are validated against the registry, keeping prose, fixtures, and emitted payloads in lockstep.
- The three-family note is part of the v3.8.0 release line instead of living only as post-v3.7.0 main-branch docs.
- Trust Card schema v5 wording is tightened around the 10-claim surface. There are no new Trust Basis claims in this release.
This minor release makes the first three-family evidence-portability surface release-ready. Assay can now reduce selected external eval outcomes, runtime decision details, and model inventory/provenance surfaces into bounded receipts, compile supported receipt families into Trust Basis, and keep the same claim-level boundary discipline as the earlier Promptfoo lane.
- Three receipt families are claim-visible: supported eval, decision, and
inventory receipt bundles can now surface bounded Trust Basis boundary claims:
external_eval_receipt_boundary_visible,external_decision_receipt_boundary_visible, andexternal_inventory_receipt_boundary_visible. These claims mean the supported receipt boundary and provenance are visible; they do not mean upstream eval correctness, flag-decision correctness, model safety, dataset approval, BOM completeness, license posture, vulnerability posture, or compliance truth. - OpenFeature decision receipts:
assay evidence import openfeature-detailsimports bounded boolean OpenFeatureEvaluationDetailsrows into verifiable decision receipt bundles. The v1 lane keeps provider config, evaluation context, targeting keys, rules, user identifiers, flag metadata, provider metadata,error_message, and non-boolean values out of the canonical receipt path. - CycloneDX ML-BOM model-component receipts:
assay evidence import cyclonedx-mlbom-modelimports one selectedmachine-learning-modelcomponent as a bounded inventory receipt. The v1 lane keeps full BOM graphs, model-card bodies, dataset bodies, pedigree, vulnerabilities, licenses, metrics, safety posture, and compliance semantics out of the receipt. - Mastra ScoreEvent receipts:
assay evidence import mastra-score-eventimports reduced, reviewer-safe Mastra ScoreEvent JSONL artifacts into score receipt bundles. This lane does not yet add a Trust Basis score claim; it is intentionally separate from the three-family public claim surface. - Trust Card schema v5: Trust Card output now reflects the expanded
claim table. Consumers must continue to key by stable
claim.id, not row position or row count. - Receipt family matrix:
docs/reference/receipt-family-matrix.jsonrecords each supported receipt family, event type, Trust Basis claim, included fields, excluded fields, and explicit non-claims.
- Added OpenFeature, CycloneDX ML-BOM, and Mastra ScoreEvent evidence examples plus CLI reference docs for the new importers.
- Updated the evidence contract registry with the new experimental receipt event types.
- This is a release of bounded receipt compiler lanes, not official integration or partnership support for Promptfoo, OpenFeature, CycloneDX, or Mastra.
- Trust Basis and Trust Card consumers should treat the new claim rows as
additive. Select claims by
claim.idand tolerate unknown future claims. - Assay Harness
v0.3.1is the intended companion release for running the Promptfoo, OpenFeature, and CycloneDX recipes over this claim surface.
This minor release makes the first external-eval evidence portability lane release-ready. Assay can now import selected external evaluation outcomes as bounded evidence receipts, carry them through Trust Basis, and compare claim artifacts without importing full eval-run truth or claiming model correctness.
- External eval outcomes as bounded receipts: Assay now has the first evidence-portability lane for selected external eval outcomes. The lane starts with Promptfoo assertion-component results, compiles them into Assay evidence receipts, carries them through Trust Basis / diff, and keeps the boundary explicit: no full eval-run import, no Promptfoo integration claim, and no model-correctness truth. See From Promptfoo JSONL to Evidence Receipts.
- Promptfoo JSONL receipt import:
assay evidence import promptfoo-jsonlimports strict Promptfoo CLI JSONL rows fromgradingResult.componentResults[]and writes verifiable Assay evidence bundles. The v1 lane is deterministic-assertion-first (equals, binary0/1component scores) and excludes raw prompt, output, expected value, vars, provider payloads, token/cost data, and full JSONL rows. - Trust Basis visibility for external receipts: supported external eval
receipt bundles can now surface the bounded
external_eval_receipt_boundary_visibleclaim. The claim means the receipt boundary and provenance are visible; it does not mean the upstream eval run passed or that Assay imports upstream payloads as truth. - Trust Basis diff contract:
assay trust-basis diffcompares canonical Trust Basis artifacts by stable claim identity, reports added / removed / improved / regressed / metadata-only changes, and can fail CI only on claim-presence or claim-level regressions.
- Promptfoo evidence sample and recipe path: the Promptfoo assertion
grading-result sample is restored on
main, and the Assay-side note explains the evidence portability boundary without positioning this as a Promptfoo integration or partnership. - Additional bounded evidence examples: OpenFeature
EvaluationDetailsand Guardrails validation-outcome lanes document adjacent evidence units while staying clear of provider-config truth, corrected-output truth, and full run history.
- Trust Basis and Trust Card consumers should keep selecting claims by stable
claim.id, not row position or row count. The external-eval receipt claim is additive. - The Promptfoo lane is downstream evidence portability over existing JSONL/assertion surfaces. It is not official Promptfoo support, not a partnership claim, and not a full Promptfoo export importer.
This patch release keeps the v3.5.0 trust-compiler surface intact, but makes the
new MCP Registry publication path honest and publishable. It is the first Assay
release line that can ship a real assay-mcp-server-<version>-linux.mcpb asset
plus generated official-registry metadata from the same release asset set.
- Official MCP Registry publication foundation: Release builds now package
Linux
assay-mcp-serverarchives into a realassay-mcp-server-<version>-linux.mcpbbundle and generateserver.jsonfrom the released MCPB asset URL plus SHA-256. This replaces the old hand-maintained metadata story with a bounded, supportedmcpbpublication path for the official MCP Registry.
- CrewAI event evidence sample: Assay now ships a small sample-first
examples/crewai-event-evidence/flow that exports bounded CrewAI runtime events to NDJSON and maps them into Assay-shaped placeholder evidence without promoting CrewAI runtime semantics into Assay truth.
This release makes the first bounded MCP authorization-discovery seam public. K2-A Phase 1 now
ships in the public Assay line as visibility-only evidence for typed MCP auth-discovery surfaces,
without broadening into an auth-discovery pack, auth-success claims, or compliance theater.
K2-APhase 1: Assay now publicly ships the first bounded MCP authorization-discovery seam on imported MCP traces viaepisode_start.meta.mcp.authorization_discovery. The slice is visibility-only, promotes positively only from typed runtime-observedWWW-Authenticatediscovery on supported401transport paths, and explicitly does not imply auth success, scope adequacy, issuer trust, or compliance.
This patch release makes the post-v3.3.0 trust-compiler line public: G4-A Phase 1 (payload.discovery), built-in P2c (a2a-discovery-card-followup), and K1-A Phase 1 (payload.handoff) now ship in the released binaries and Python wheels. It also refreshes outward-facing package/release communication so the published line matches the actual shipped surface.
G4-APhase 1: The A2A adapter now publicly ships the bounded top-levelpayload.discoveryseam for discovery / Agent Card visibility on canonical adapter evidence. This remains adapter-emitted, visibility-only evidence with explicit non-goals around validity, trust, or verification semantics. See PLAN-G4 and G4-A freeze.P2cA2A discovery/card follow-up pack (a2a-discovery-card-followup): Built-in A2A-DC-001 / A2A-DC-002 now ship publicly. The pack mirrorspacks/open/a2a-discovery-card-followup/, usesjson_path_exists.value_equalsfor booleantrue, and keeps the G4-A / P2c floor semantics (requires.assay_min_version: ">=3.3.0") without a new engine bump. See MIGRATION — P2c pack and PLAN-P2c.K1-APhase 1:assay-adapter-a2anow publicly emits a bounded top-levelpayload.handoffobject on canonical A2A adapter evidence. The seam is always present, promotes positively only for typedassay.adapter.a2a.task.requestedpackets withtask.kind == "delegation", and explicitly does not promote fromtask.updated,artifact.shared, generic-message fallback, or syntheticunknown-task. No new pack, engine bump, Trust Basis change, or Trust Card change ships in this slice. See PLAN-K1 and K1-A freeze.
assay-itoutward-facing metadata: The Python package now ships with a package-level README and bounded public metadata that matches the actual surface:AssayClient,Coverage,Explainer, and the pytest fixture. The published package description no longer implies the full Assay CLI or broader trust-compiler surfaces.
- Release notes template truth sync: GitHub release notes now use the canonical install URL
https://getassay.dev/install.shand the canonical action slugRul1an/assay-action@v2, avoiding stale release-copy drift on future tags.
This release completes the first trust-compiler product line on a single public baseline: canonical Trust Basis, Trust Card schema 2 with seven claims (key by stable claim.id), G3 authorization-context evidence, pack engine 1.2, built-in mcp-signal-followup and a2a-signal-followup, migration SSOT, and kernel/pack alignment tests. See MIGRATION-TRUST-COMPILER-3.2.md, PLAN-P2a, PLAN-P2b, and RELEASE-PLAN-TRUST-COMPILER-3.3.md. Pack requires.assay_min_version: ">=3.2.3" remains the evidence-substrate floor; v3.3.0 is the first release embedding both built-in companion packs in release binaries.
- P2b A2A companion pack (
a2a-signal-followup): Built-in pack with three presence-only rules on canonical adapter evidence — A2A-001 (assay.adapter.a2a.agent.capabilities), A2A-002 (assay.adapter.a2a.task.*), A2A-003 (assay.adapter.a2a.artifact.shared). Uses existing pack checks (event_type_exists); no new engine version. Open mirror underpacks/open/a2a-signal-followup/. Pack YAML setsrequires.assay_min_version: ">=3.2.3"(evidence-substrate floor per MIGRATION-TRUST-COMPILER-3.2.md, same discipline as PLAN-P2a). v3.3.0 is the first Assay release with this pack built in. See PLAN-P2b. - H1 — Trust kernel alignment & release hardening: Single migration SSOT (MIGRATION-TRUST-COMPILER-3.2.md), PLAN-H1, integration tests for Trust Basis ↔ MCP-001 lockstep and Trust Basis ↔ Trust Card invariants (no new semantics).
- P2a MCP companion pack (
mcp-signal-followup): Built-in pack with three rules — MCP-001 uses pack checkg3_authorization_context_present(engine v1.2), sharing the same predicate as Trust Basisauthorization_context_visible(verified); MCP-002 / MCP-003 cover delegation (delegated_from) and containment degradation (assay.sandbox.degraded). Open mirror underpacks/open/mcp-signal-followup/.assay_min_version: >=3.2.3tracks the prerequisite line (G3 + Trust Card schema 2; v3.2.3 is the reference tag for that substrate, not for built-in pack presence). v3.3.0 is the first Assay release with this pack built in — see PLAN-P2a. - Pack engine v1.2: Adds
g3_authorization_context_present; bumpsENGINE_VERSIONinassay-evidence(mandate-baseline rules that declaredengine_min_version: "1.2"now execute with this engine). - T1a Trust Basis Compiler MVP: Assay now ships a canonical
trust-basis.jsoncompiler surface onmain, derived from verified bundles with fixed claim keys, fixed evidence vocabularies, and deterministic regeneration. - Low-level trust compiler CLI: Repository builds now expose
assay trust-basis generate <bundle>for advanced CI, diffing, and review workflows. - G3 Authorization Context Evidence: Supported MCP tool-call paths can merge policy-projected
auth_scheme,auth_issuer, andprincipalontoassay.tool.decisionevidence; normalization allowlists schemes, trims issuer, rejects JWS-compact andBearercredential material, and omits whitespace-only principals. - Trust Card schema v2: Trust Basis emits seven claims (adds
authorization_context_visiblebetween delegation and containment);trustcard.jsonusesschema_version2. Downstream consumers should select claims by stableid, not assume a fixed row count.
- Claim-first boundary:
T1aships claim classification in the compiler layer, not in a Trust Card renderer. - Deliberate non-goals: This wave does not yet ship
trustcard.json,trustcard.md, a trust score, asafe/unsafebadge, or new signal/pack/engine semantics.
- New MCP integrity metrics: Added
tool_description_integrity,tool_output_valid, andtool_collision_detectto cover tool-definition drift, output-schema contracts, and cross-server tool shadowing.
- Runtime monitor output:
assay monitorblocked-file events now print structureddev,ino,cgroup, andrule_idfields instead of raw payload text. - Ring buffer pressure summary:
assay monitornow reports emitted and dropped ring-buffer counters for tracepoint, LSM, and socket monitor paths at the end of a run. - Metric evaluation spans: The runner now emits one
assay.eval.metricspan per metric evaluation with stable fields for latency, cached status, pass/fail, unstable state, and error reporting.
- CycloneDX release asset: Release builds now publish
assay-${VERSION}-sbom-cyclonedx.tar.gzandassay-${VERSION}-sbom-cyclonedx.tar.gz.sha256alongside the existing binaries.
- crates.io publish: Exclude assay-adapter-api from publish list (Trusted Publishing not configured). Use 3.1.0 from crates.io.
- crates.io publish: Broaden grep pattern for token-not-valid skip.
- Windows build: Gate
std::os::unix::fs::PermissionsExtwith#[cfg(unix)]so the Windows release build succeeds.
- Cross-platform builds re-enabled: macOS x86_64, macOS aarch64 (Apple Silicon), and Windows x86_64 are back in the release matrix.
- Runner updates (March 2026):
macos-15(was macos-14),windows-2025(explicit version). - Install script:
curl -fsSL https://getassay.dev/install.sh | shnow supports macOS ARM.
- Typed decisions + Decision Event v2: Deterministic typed decision outcomes with structured
DecisionDatapayloads replacing stringly-typed fields. - Obligation execution: Runtime execution of
log,alert,approval_required,restrict_scope, andredact_argsobligations with deterministic evidence emission. - Approval enforcement:
approval_requiredblocks tool calls without valid approval artifacts; approval shape is additive evidence. - Restrict scope enforcement:
restrict_scopenarrows tool-call arguments at runtime with evidence of what was restricted and why. - Redact args enforcement:
redact_argsstrips sensitive fields from tool-call arguments before forwarding, with redaction evidence markers. - Fulfillment normalization: Obligation fulfillment outcomes are normalized into a stable contract for downstream consumers.
- Deny/fail-closed evidence convergence: Deny paths and fail-closed decisions emit consistent, typed evidence with deterministic precedence.
- Replay diff basis: Deterministic replay diff buckets with legacy fallback classification for backward compatibility.
- Evidence compatibility normalization: Replay evidence compatibility markers for additive reader contracts.
- Consumer hardening: Frozen consumer read precedence for
DecisionEvent,DecisionData, andReplayDiffBasispayloads. - Context envelope hardening: Completeness markers and additive metadata on context-envelope payloads.
assay evidence store-status: New diagnostic command — checks connectivity, credentials, inventory, and write access. Supports JSON, table, and plain output. Exit codes: 0 (OK), 1 (connectivity/access failure), 2 (config error)..assay/store.yamlconfig: Structured YAML configuration for evidence store connection. Precedence:--store>ASSAY_STORE_URL> config file. Credentials stay in environment variables.- Config fallback for push/pull/list:
--storeis now optional — falls back toASSAY_STORE_URLor.assay/store.yamlautomatically. - Provider quickstart docs: AWS S3, Backblaze B2, MinIO setup guides.
- Architecture-as-code workspace: Structurizr/C4, building blocks, quality scenarios, Obsidian view layer, catalog metadata.
- ADR-027 through ADR-031 closed as implemented contracts.
- Repo-wide architecture gap analysis and roadmap truth sync.
- Release/changelog hygiene: consolidated to single curated CHANGELOG.md.
- Evidence command dispatch is now async (fixes nested tokio runtime panic for BYOS commands).
StoreConfig::discover()returns errors on malformed config files instead of silently ignoring them.
assay_core::mcp::policy::ToolPolicyaddsallow_classesanddeny_classes.assay_core::mcp::decision::DecisionDataaddstool_classes,matched_tool_classes,match_basis, andmatched_rule.- External struct-literal construction against these types now requires populating the new fields.
- Coverage v1.1 polish:
assay coveragesupports--out-mdfor reviewer-friendly markdown output and--routes-topfor route summary control while JSON remains canonical (coverage_report_v1). - MCP coverage/session exports:
assay mcp wrapsupports--coverage-outand--state-window-outinformational artifacts with stable schemas and explicit write logging. - Tool taxonomy governance: MCP policy evaluation and decision metadata include tool taxonomy class matching (
tool_classes,matched_tool_classes) for broader sink/source governance coverage.
- Added/finalized ADR contract line for taxonomy, coverage, session/state window, and coverage DX polish (ADR-027/028/029/030/031).
- Added operational runbooks for taxonomy+coverage and session/state export usage in enterprise workflows.
This release introduces the Pack Registry Client (assay-registry crate) - a complete implementation of SPEC-Pack-Registry-v1.0.3 for secure remote pack distribution.
-
Pack Registry Client (
crates/assay-registry/):- HTTP client with token + OIDC authentication
- Pack resolution: local → bundled → registry → BYOS
- Local caching with TOCTOU protection (integrity verified on every read)
- Lockfile v2 for reproducible builds (
assay.packs.lock)
-
JCS Canonicalization (RFC 8785):
- Deterministic JSON serialization for pack digests
- Uses
serde_jcs::to_vec()(bytes, not string) to eliminate encoding issues - Canonical digest format:
sha256:{hex}
-
Strict YAML Validation (SPEC §6.1):
- Pre-scan rejects anchors (
&), aliases (*), tags (!!), multi-document (---) - Duplicate key detection with correct list-item scoping
- DoS limits: max depth 50, keys 10k, string 1MB, input 10MB
- Integer range checks: ±2^53 (IEEE 754 safe integer)
- Pre-scan rejects anchors (
-
DSSE Signature Verification:
- Ed25519 + PAE encoding per DSSE spec
- Sidecar endpoint (
GET /packs/{name}/{version}.sig) for large signatures - Client always prefers sidecar over
X-Pack-Signatureheader
-
Trust Model (No-TOFU):
- Pinned root keys compiled into binary
- Key rotation via signed manifest
- Pinned roots survive remote revocation attempts
- Runtime expiry checks for manifest keys
- Contract tests for all v2.1 features:
- Pack lint with
eu-ai-act-baseline+ SARIF validation - Fork PR SARIF skip logic
- OIDC provider auto-detection (AWS/GCP/Azure patterns)
- Attestation gating (push-only, default branch, verified)
- Coverage calculation formula
- Pack lint with
- Duplicate Key Detection: Pre-scan catches block mapping duplicates; serde_yaml catches flow mapping duplicates
- DSSE Verification: Signature verification uses canonical JCS bytes (not raw YAML)
- List-Item Scoping: Each list item gets its own scope (fixes false positives for
- a: 1\n- a: 2)
assay-registryv2.11.0 on crates.io
docs/architecture/SPEC-Pack-Registry-v1.mdupdated to v1.0.3docs/architecture/ADR-018-GitHub-Action-v2.1.md- Action v2.1 designdocs/architecture/SPEC-GitHub-Action-v2.1.md- Action v2.1 specification- Security review documentation in
crates/assay-registry/docs/
- 185 tests in
assay-registrycrate - Golden vectors for JCS digest verification
- DSSE real signature verification tests
- Trust rotation and revocation tests
- Cache tamper detection tests
- Protocol edge cases (304/410/429)
This release introduces the Pack Engine - a YAML-driven compliance/security/quality rule system for evidence bundle linting, with the first built-in pack for EU AI Act Article 12.
-
Pack Engine (
crates/assay-evidence/src/lint/packs/):- YAML-defined rule packs with typed checks
- Check types:
event_count,event_pairs,event_field_present,event_type_exists,manifest_field - JSON Pointer (RFC 6901) for field addressing
- JCS canonicalization (RFC 8785) for deterministic pack digests
- Collision policy: compliance packs hard-fail, security/quality last-wins
-
EU AI Act Baseline Pack (
packs/eu-ai-act-baseline.yaml):EU12-001: Event recording (Article 12(1))EU12-002: Operation monitoring - started/finished pairs (Article 12(2)(c))EU12-003: Post-market monitoring - correlation IDs (Article 12(2)(b))EU12-004: Risk identification - policy/denial fields (Article 12(2)(a))
-
CLI Integration:
--pack: Comma-separated pack references (built-in or file path)--max-results: Limit findings for GitHub SARIF size limits (default: 500)
-
GitHub Code Scanning Compatible SARIF:
locations[]on all results (including global findings)primaryLocationLineHashfor GitHub deduplication- Pack metadata in
tool.driver.properties.assayPacks[] run.properties.disclaimerfor compliance packs- Truncation policy with
run.properties.truncated/truncatedCount
docs/architecture/SPEC-Pack-Engine-v1.md- Complete implementation specdocs/architecture/ADR-013-EU-AI-Act-Pack.md- EU AI Act pack designdocs/architecture/ADR-016-Pack-Taxonomy.md- Pack taxonomy and open core model
# Run EU AI Act baseline checks
assay evidence lint bundle.tar.gz --pack eu-ai-act-baseline
# SARIF output for GitHub Code Scanning
assay evidence lint bundle.tar.gz --pack eu-ai-act-baseline --format sarif
# Custom pack file
assay evidence lint bundle.tar.gz --pack ./my-pack.yamlThis release delivers State-of-the-Art sandbox hardening, addressing MCP security guidance for credential isolation, honest capability reporting, and fork-safe enforcement.
- Environment Scrubbing (
env_filter.rs):- Default-deny for secrets (
*_TOKEN,*_KEY,*_SECRET,AWS_*,GITHUB_*) - CLI flags:
--env-allow=VAR=value,--env-passthrough=VAR - Always sets
TMPDIRto scoped sandbox directory
- Default-deny for secrets (
- Landlock Deny-wins Correctness (
landlock_check.rs):- Detects "deny inside allow" conflicts that Landlock cannot enforce
- Automatic degradation to Audit mode with explicit warning
- Prevents false sense of security from unenforceable policies
- Fork-Safe pre_exec:
- Eliminated heap allocations in
pre_execclosure - Uses
std::io::Error::from_raw_os_error()instead ofanyhow::bail!() - Syscall-only in critical fork-exec window
- Eliminated heap allocations in
- Scoped /tmp Isolation:
- UID-based (not
$USERenv which can be spoofed) - Per-run isolation via PID in path
- 0700 permissions (owner-only)
- Prefers
XDG_RUNTIME_DIRwhen available
- UID-based (not
- Doctor Deep Dive v2:
- Reports Phase 5 hardening feature status
- Reads actual Landlock ABI version from sysfs
- Net enforcement correctly reports ABI >= 4 requirement
scripts/ci/phase5-check.sh: New quality gate scriptCARGO_TARGET_DIR=/tmp/assay-targetfor VM mount compatibility--lockedon all cargo commands- Strict Clippy
-D warnings
- Fixed
unused_assignmentswarning on macOS via#[cfg(target_os = "linux")] - Fixed
io_other_errorClippy lint (Rust 1.93) - Added
#[allow(dead_code)]for non-Linux Landlock stubs
This release delivers "State-of-the-Art" infrastructure hardening, specifically targeting ARM/Self-Hosted stability and CI reliability. It eliminates supply chain risks and ensures deterministic builds across all platforms.
- Robust ARM Infrastructure: Implemented a "GoFoss -> Ubuntu Ports" failover loop for all ARM runners. This eliminates flaky
404errors caused by the unstableports.ubuntu.commirror.- Generic Logic: The failover script aggressively rewrites any
ubuntu-portssource, scrubbing legacy/broken mirrors (e.g.edge.kernel.org) from self-hosted runners. - Optimization: Automatically skips logic on AMD64 runners (
ubuntu-latest) to preserve "Fast Path" performance.
- Generic Logic: The failover script aggressively rewrites any
- Intelligent Gating:
- Fork Safety: Self-hosted runners are now strictly gated (
if: fork == false) to prevent malicious code execution from PR forks. - Split Smoke:
ebpf-smokeis split into-ubuntu(for signal) and-self-hosted(for depth), ensuring forks still get CI feedback.
- Fork Safety: Self-hosted runners are now strictly gated (
- Performance "Fast Path":
- Install-First: All apt jobs now attempt
installbeforeupdate, leveraging fresh runner caches for significant speedups. - Hardened Flags: Ubiquitous use of
DEBIAN_FRONTEND=noninteractiveand--no-install-recommends.
- Install-First: All apt jobs now attempt
- Artifact Sequencing: Fixed a race condition in
kernel-matrix.yml(matrix-test) where install scripts ran before artifact download. - Supply Chain: Enforced
--locked/ pinned versions for allbpf-linkerinstallations. - Cleanup: Removed legacy
actions/cacheusage for apt-lists (native disk caching is superior on self-hosted).
Critical release hardening the BPF-LSM implementation for production readiness.
- Verifier Fix: Resolved BPF verifier rejection (exit code 40) by optimizing
emit_event(removed zeroing loop). - RingBuf Safety: Implemented secure, full-buffer copy to prevent uninitialized memory leakage to userspace.
- Explicit Deny: Validated E2E
action: "deny"enforcement (EPERM blocking). - CI Gate: Hardened
verify_lsm_docker.shto enforce hard failures on blocking misses.
This major release delivers the State-of-the-Art (SOTA) architecture for robust runtime security, transitioning from "Best Effort" to "Forensically Sound" monitoring.
- Cgroup-First Architecture:
assay-monitorandassay-ebpfnow prioritize cgroup membership over PID tracking, usingbpf_get_current_ancestor_cgroup_idto prevent nested cgroup escapes. This ensures 100% coverage of short-lived processes. - Forensic Incident Bundles:
- Secure Atomic Writes: Implementation of
IncidentBuilderusingopenat,O_NOFOLLOW,O_EXCL, andrenameatto prevent TOCTOU vulnerabilities. - Unique Identity: Incident files now use UUID v4 suffixes to guarantee uniqueness.
- Detailed Metadata: Includes kernel version, session UUID, and process tree context.
- Secure Atomic Writes: Implementation of
- eBPF Hardening:
- Dynamic Offsets: Removed all hardcoded kernel offsets in favor of runtime resolution via
/sys/kernel/tracing/events/.../format. - Extended Coverage: Added
sys_enter_openat2probe for modern kernels (Linux 5.6+). - Safety: Uses
read_user_str_byteswith explicit bounds checking safe slices.
- Dynamic Offsets: Removed all hardcoded kernel offsets in favor of runtime resolution via
- CI Reliability: Complete overhaul of CI pipelines using
sccache(local backend),moldlinker (Linux), and single-pass testing. Zero 400 errors from GH Actions Cache. - Windows Support: Fixed compilation issues in
assay-cliby guarding Unix-specific cgroup logic. - Golden Tests: Resolved output mismatches for strict reproducibility.
This release transforms Assay from a static analyzer into a complete Runtime Security Platform. It introduces the "System 2" capabilities: detecting and stopping dangerous behavior as it happens.
- Runtime Monitor (
assay monitor) (Linux Only):- Uses eBPF (extended Berkeley Packet Filter) to trace process behavior safely in kernel space.
- Detects file access (
openat) and network connections (connect) in real-time. - Zero-Overhead: Highly optimized "Read-First" ring buffer implementation.
- Discovery (
assay discover):- Automatically inventory running MCP servers and local configurations.
- Detects unmanaged servers and security gaps.
- Kill Switch (
assay kill):- Emergency termination of rogue agent processes.
- Supports graceful shutdown (SIGTERM) and immediate kill (SIGKILL).
- Native eBPF Builds: CI now builds eBPF artifacts natively (no Docker required), ensuring determinism and stability.
- Host Build Protection: The
assay-ebpfcrate is feature-gated to prevent accidental linking on non-Linux hosts. - Strict Dependencies: All upstream dependencies are strictly pinned for reproducibility.
- Unified Reference: Consolidated runtime documentation into
docs/runtime-monitor.md. - Handoff: Comprehensive architecture & maintenance guide available for contributors.
- Refined Deprecations: Formal deprecation of v1.x constraints syntax.
- Strict Mode: New
--deny-deprecationsflag (andASSAY_STRICT_DEPRECATIONS=1env var) to enforce strict compliance in CI. - Migration Guide: New detailed v1-to-v2 Migration Guide.
- Startup Warnings: Server/Proxy now emit clear warnings when loading legacy policies.
- CLI:
assay policy validate --deny-deprecations(and forrun/wrapmodes). - Docs: Comprehensive
docs/migration/v1-to-v2.md.
- Policy v2.0 (JSON Schema): Official support for JSON Schema constraints (
schemas:) replacing regex loops. - Unified Policy Engine:
assay-core,assay-cli, andassay-mcp-servernow share the exact same evaluation logic (McpPolicy::evaluate). - New Commands:
assay policy validate,migrate, andfmt. - Enforcement Modes:
enforcement.unconstrained_tools: warn|deny|allowfor finer control over headless/legacy tools. - Scoped Refs:
$refsupport within single policy documents (#/schemas/$defs/...).
- Runtime Consistency:
assay mcp wrap(proxy) andassay-mcp-serverenforce the exact same rules asassay coverage. - Auto-Migration: Legacy v1 policies (
constraints:) are auto-migrated in-memory with deprecation warnings.
- v1 Constraints: The
constraints:syntax is deprecated and will be removed in Assay v2.0.0. Useassay policy migrateto upgrade.
- JSON Casing: Stabilized
structuredContentvsstructured_contentin error contracts. - Symlink Resolution: Fixed policy resolution issues on macOS
/tmp.
A major productivity release introducing automated self-repair (assay fix) and instant policy scaffolding (assay init --pack).
assay fix: Interactively repair configuration issues.- Automated Patches: Fixes config errors, schema violations, and missing policies based on diagnostics.
- Dry Run: Preview changes before applying them.
- Atomic Writes: Cross-platform safe file updates (Windows/Linux/macOS).
- Policy Packs (
assay init --pack):default: Balanced security (blocks RCE, audits sensitive ops).hardened: Maximum security (allowlist-only, strict args).dev: Permissive for rapid prototyping (logs warnings).
- Patch Engine: Strict traversal prevents partial mutations during
remove/replaceoperations. - Module Cleanup: Extracted shared logic to
assay-cli::utilfor better maintainability. - Windows Support: Robust atomic file replacement strategy.
Post-release hardening for Agentic Contract and SARIF compliance.
- Contract Consistency: Internal severity normalization (
warning->warn) now applied strictly to exit code logic and CLI text output logic. - SARIF:
invocations.exitCodenow accurately reflects the CLI exit code (0/1/2). - Contract: Text output summary counts now strictly match JSON output counts.
The "CI Gate" release. This major update transforms Assay into a comprehensive CI/CD guardrail for Agentic systems.
assay init: Interactive wizard that auto-detects your project type (Python/Node/MCP) and generates secure policy + CI config in < 5s.assay validate: Dedicated CI command with strict exit codes (0=Pass, 1=Fail, 2=Error) and zero overhead.- Agentic Contract:
--format jsonoutput is now strictly typed, stable, and designed for AI self-correction loops. - GitHub Advanced Security:
--format sarifsupport for direct integration with GitHub Code Scanning.
- Overhaul: Complete rewrite of
Quickstart,CLI Reference, andArchitectureguides. - GetAssay.dev: One-line install script and landing page sync.
Simplified 1-step setup for Claude Desktop, Cursor, and other MCP clients.
- Auto-detection: Automatically finds config files on macOS, Windows, and Linux.
- Generation: Generates secure JSON snippets for your
mcpServersconfig. - Security: Enforces policy file usage by default.
- Fail-Secure: CLI now fatal-errors if specified policy file is missing (no insecure fallbacks).
- Policy: clarifications on rate limit fields.
- Proxy: Improved logging for unknown tool calls.
- Python Wheels: Fixed extensive artifact corruption issue in Release workflow (
release.yml). - Linting: Strict
clippyandrustfmtcompliance across the board.
- README: Fixed broken CI status badge (pointed to non-existent
assay.yml).
- Index: Aligned landing page with new "Vibecoder + Senior" positioning.
- User Guide: Rewritten to focus on CI/CD, Doctor, and Python workflows (removed legacy RAG metrics noise).
- Consistency: Unified messaging across README and documentation site.
- README: Overhauled for "Vibecoder + Senior" audience.
- Guides: Updated Python Quickstart and Identity docs.
- Consistency:
assay-itis now the canonical package name in docs.
- Removed redundant directories (
test-*/,assay-doctor-*). - Refactored
doctormodule to remove verbose comments. - Zero fluff policy applied.
- Python Docs: Added comprehensive docstrings to
assay.Coverage,assay.validate, andAssayClientwrappers. IDEs will now show rich tooltips. (Google-style) - Stability: Added CLI verification tests for
assay init-ci.
Patch release to verify cargo fmt compliance after v1.2.6 refactoring.
Patch release to fix a stable-clippy lint regex_creation_in_loops.
- Performance: Regex is now compiled once per doctor suite, not per policy.
Updated pyproject.toml to explicitly use assay-it as the package name, ensuring maturin builds the correct wheel metadata for PyPI.
- Distribution Name:
assay-it(Final Fix)
Renamed the Python SDK distribution package to assay-it to match the PyPI project name.
- Distribution Name:
assay-it(PyPI) - Import Name:
import assay(Unchanged)
Patch release to resolve build pipeline issues.
- Fix: Resolved artifact corruption in wheel generation (PyPI Release).
- Fix: Corrected formatting in
doctor/mod.rsto pass strict CI linting.
Strictness doesn't have to be unfriendly. This release polishes the "Strict Schema" experience.
- Friendly Hints: When unknown fields are detected (e.g.
require_args), Doctor now suggests the closest valid field ("Did you meanrequire_args?"). - Output:
assay doctornow correctly displays diagnostic messages in human-readable output (previously they were counted but hidden). - Release Fix: Removed legacy workflows to ensure smooth PyPI publishing.
Transformed assay doctor into a "System 2" diagnostic engine for Agentic workflows.
- Analyzers:
- Trace Drift: Detects legacy
function_callusage (recommendstool_calls). - Integrity: Validates existence of all referenced policy/config files.
- Logic: Detects alias shadowing (e.g.
Searchalias hidingSearchtool).
- Trace Drift: Detects legacy
- Agentic Contract:
- Output via
--format jsonis strict, machine-readable, and deterministic. - Includes
fix_stepsfor automated self-repair. - Robust JSON Errors: Even config parsing failures return valid JSON envelopes (when requested), ensuring Agents never crash on plain text errors.
- Output via
To prevent "Silent Failures" (phantom configs), we now enforce Strict Schema Validation:
- Unknown fields in
assay.yamlorpolicy.yamlnow cause a HARD ERROR. - Previously, typos or incorrect nesting (e.g.
tools: ToolName:) were silently ignored. Now you will seeE_CFG_PARSEwith "unknown field". - Why: Required for reliable Agentic generation and debugging.
- Demo:
assay demonow generates canonical, schema-compliant policies. - DX: Restored
request_iduniqueness check in trace client.
Native Python bindings for seamless integration into Pytest and other Python workflows.
AssayClient: Record traces directly from python code usingclient.record_trace(obj).Coverage: Analyze trace coverage withassay.Coverage(policy_path).analyze(traces).Explainer: Generate human-readable explanations of tool usage vs policy.- Performance: Built on
PyO3+maturinfor high-performance Rust bindings.
New assay coverage command to enforce quality gates in CI.
- Min Coverage: Fail build if coverage drops below threshold (
--min-coverage 80). - Baseline Regressions: Compare against a baseline and fail on regression (
--baseline base.json). - High Risk Gaps: Detect and fail if critical
deny-listed tools are never exercised. - Export: Save baselines with
--export-baseline.
Manage and track baselines to detect behavioral shifts.
assay baseline record: Capture current run metrics.assay baseline check: Diff current run against stored baseline.- Determinism: Guaranteed deterministic output for reliable regression testing.
assay-python-sdkpackage on PyPI (upcoming).TraceExplainerlogic exposed to Python.
New sequence operators for complex agent workflow validation:
-
max_calls- Rate limiting per toolsequences: - type: max_calls tool: FetchURL max: 10 # Deny on 11th call
-
after- Post-condition enforcementsequences: - type: after trigger: ModifyData then: AuditLog within: 3 # AuditLog must appear within 3 calls after ModifyData
-
never_after- Forbidden sequencessequences: - type: never_after trigger: Logout forbidden: AccessData # Once logged out, cannot access data
-
sequence- Exact ordering with strict modesequences: - type: sequence tools: [Authenticate, Authorize, Execute] strict: true # Must be consecutive, no intervening calls
Define tool groups for cleaner policies:
aliases:
Search:
- SearchKnowledgeBase
- SearchWeb
- SearchDatabase
sequences:
- type: eventually
tool: Search # Matches any alias member
within: 5New assay coverage command for CI/CD integration:
# Check tool and rule coverage
assay coverage --policy policy.yaml --traces traces.jsonl --min-coverage 80
# Output formats: summary, json, markdown, github
assay coverage --policy policy.yaml --traces traces.jsonl --format githubFeatures:
- Tool coverage: which policy tools were exercised
- Rule coverage: which rules were triggered
- High-risk gaps: blocklisted tools never tested
- Unexpected tools: tools in traces but not in policy
- Exit codes: 0 (pass), 1 (fail), 2 (error)
- GitHub Actions annotations for PR feedback
- uses: assay-dev/assay-action@v1
with:
policy: policies/agent.yaml
traces: traces/
min-coverage: 80curl -sSL https://assay.dev/install.sh | sh- Policy version bumped to
1.1 - Improved error messages with actionable hints
- Better alias resolution performance
The following features are available but not yet stable:
assay explain- Trace debugging and visualization (use--experimentalflag)
v1.1 is fully backwards compatible with v1.0 policies. To use new features:
- Update
version: "1.0"toversion: "1.1"in your policy files - Add
aliasessection if using tool groups - Add new sequence rules as needed
Existing v1.0 policies will continue to work without modification.
- Structured Logging:
assay-corenow usestracingfor fail-safe events (assay.failsafe.triggered), enabling direct Datadog/OTLP integration. - Protocol Feedback:
assay-mcp-servernow includes awarningfield in the response whenon_error: allowis active and an error occurs, allowing clients to adapt logic. - Documentation: Added "Look-behind Workarounds" to
docs/guides/migration-regex.md.
Rapid-response release addressing critical Design Partner feedback regarding MCP protocol compliance and operational visibility.
- Structured Fail-Safe Logging: Introduced
assay.failsafe.triggeredJSON event whenon_error: allowis active, enabling machine-readable audit trails. - Fail-Safe UX: Logging now occurs via standard
stderrto avoid polluting piping outputs.
- MCP Compliance:
assay-mcp-servertool results are now wrapped in standardCallToolResultstructure ({ content: [...], isError: bool }), enabling clients to parse error details and agents to self-correct.
First Release Candidate for Assay v1.0.0, introducing the "One Engine, Two Modes" guarantee and unified policy enforcement.
- Unified Policy Engine: Centralized validation logic (
assay-core::policy_engine) shared between CLI, SDK, and MCP Server. - Fail-Safe Configuration: New
on_error: block | allowsettings for graceful degradation. - Parity Test Suite: New
tests/parity_batch_streaming.rsensuring identical behavior between batch and streaming modes. - False Positive Suite:
tests/fp_suite.yamlvalidation for legitimate business flows. - Latency Benchmarks: confirmed core decision latency <0.1ms (p95).
- Resolved schema validation discrepancies between local CLI and MCP calls.
- Fixed
sequence_validassertions to support regex-based policy matching.
This release marks the transition to a hardened, production-grade CLI. It introduces strict contract guarantees, robust migration checks, and full CI support.
- Official CI Template:
.github/workflows/assay.ymlfor drop-in GitHub Actions support. - Assay Check: New
assay migrate --checkcommand to guard against unmigrated configs in CI. - CLI Contract: Formalized exit codes:
0: Success / Clean1: Test Failure2: Configuration / Migration Error
- Soak Tested: Validated with >50 consecutive runs for 0-flake guarantee.
- Strict Mode Config:
configVersion: 1removes top-levelpoliciesin favor of inline declarations.
- Configuration: Top-level
policiesfield is no longer supported inconfigVersion: 1. You must runassay migrateto update your config. - Fail-Fast:
assay migrateandvalidatenow fail hard (Exit 2) on unknown standard fields.
- Fixed "Silent Drop" issue where unknown YAML fields were ignored during parsing.
- Resolved argument expansion bug in test scripts on generic shells.
- Soak test hardening for legacy configs
- Unit tests for backward compatibility
EvalConfig::validate()method
- Prepared
configVersion: 1logic (opt-in)