Git-style verification for LLM outputs.
AELITIUM is a library/CLI for producing and verifying tamper-evident, offline-verifiable evidence bundles for recorded LLM interactions under deterministic canonicalization.
LLM outputs can change silently. AELITIUM currently enforces fail-closed verification semantics on the validated surface and verifies whether recorded evidence has been modified after packing.
Find uncaptured LLM call sites:
aelitium scan .Capture evidence:
from aelitium import enable_litellm
enable_litellm()Verify a bundle offline:
aelitium verify-bundle ./bundle- Recorded request and recorded response artifacts can be cryptographically bound
- Post-hoc modification of canonicalized recorded artifacts is detectable
- Verification can be performed offline on the validated surface
- That the model actually executed
- That the provider was honest
- That the response is correct or truthful
- That capture was complete
- That semantic equivalence implies hash equivalence
You run the same prompt in production. One week later, the output is different.
The recorded response changed — but your logs just show two JSON blobs. It is hard to verify whether the recorded evidence was modified after packing.
git clone https://github.com/aelitium-dev/aelitium-v3
cd aelitium-v3 && pip install -e .
bash examples/drift_demo/run_demo.sh # no API key requiredSame request hash. Different recorded response hash. That means the recorded response changed for the compared bundles.
# Scan your codebase for unprotected LLM calls:
aelitium scan ./src
# LLM call sites detected: 4
# Missing evidence capture:
# ⚠ openai — worker.py:42
# ⚠ anthropic — agent.py:17
# Coverage: 2/4 (50%)
# STATUS=INCOMPLETE rc=2All commands accept --json for structured output.
API call (OpenAI / Anthropic / LiteLLM)
↓
capture adapter ← records request_hash + response_hash in-process
↓
evidence bundle ← canonical JSON + ai_manifest.json + binding_hash
↓
aelitium verify-bundle ← STATUS=VALID / STATUS=INVALID
aelitium compare ← UNCHANGED / CHANGED / NOT_COMPARABLE
Each bundle contains a deterministic SHA-256 hash of the payload, a manifest with timestamp and schema, and a cryptographic binding_hash linking the recorded request and recorded response artifacts. Anyone with the bundle can verify it — no network required.
Current binding construction:
binding_hash = SHA256(
canonical({
"request_hash": request_hash,
"response_hash": response_hash
})
)
No manual JSON. The capture adapter intercepts the API call and writes the bundle automatically.
from openai import OpenAI
from aelitium import capture_openai
client = OpenAI()
result = capture_openai(
client, "gpt-4o",
[{"role": "user", "content": "What is the capital of France?"}],
out_dir="./evidence",
)
print(result.ai_hash_sha256) # deterministic hash for this recorded request/response pairaelitium verify-bundle ./evidence
# STATUS=VALID rc=0
# AI_HASH_SHA256=...
# BINDING_HASH=... ← cryptographic link between request and responseLiteLLM routes to any provider — one adapter covers all:
from aelitium import capture_litellm
result = capture_litellm(
model="openai/gpt-4o", # or "anthropic/...", "bedrock/...", etc.
messages=[{"role": "user", "content": "What is the capital of France?"}],
out_dir="./evidence",
)
print(result.ai_hash_sha256)See Capture layer for Anthropic, LiteLLM, streaming, and signing.
Add one line. Keep using LiteLLM normally.
from aelitium import enable_litellm
import litellm
enable_litellm(out_dir="./aelitium/bundles", verbose=True)
response = litellm.completion(
model="openai/gpt-4o",
messages=[{"role": "user", "content": "Hello"}],
)
print(response.choices[0].message.content)
# AELITIUM: bundle → ./aelitium/bundles/<binding_hash> binding_hash=<hash>Every call writes a bundle automatically. The LLM response is unchanged.
What you get:
request_hash— deterministic hash of the recorded request payloadresponse_hash— hash of the recorded response contentbinding_hash— cryptographic link between the two
Failure modes:
| Mode | Capture fails | Streaming |
|---|---|---|
strict=False (default) |
warning, response returned | pass-through |
strict=True |
raises | raises |
enable_litellm(strict=True) # capture failure raises instead of warningNotes:
- Streaming calls (
stream=True) are not captured — they pass through unchanged
See examples/litellm_enable.py for a runnable example.
aelitium compare ./bundle_last_week ./bundle_today
# STATUS=CHANGED rc=2
# REQUEST_HASH=SAME a=3f4a8c1d... b=3f4a8c1d...
# RESPONSE_HASH=DIFFERENT a=9b2e7f1a... b=c41d8e3b...
# INTERPRETATION=Same request_hash with different response_hash observedIf REQUEST_HASH=SAME and RESPONSE_HASH=DIFFERENT, the compared bundles contain different recorded responses for the same hashed request. AELITIUM does not attribute the cause.
Run offline (no API key):
bash examples/drift_demo/run_demo.shOr with a real OpenAI key:
python examples/model_drift_detector.pyFind every LLM call in your codebase that isn't wrapped in a capture adapter:
aelitium scan ./src
# LLM call sites detected: 12
# Instrumented with capture adapter: 9
# ✓ openai — api/worker.py:14
# ✓ openai — api/worker.py:38
# Missing evidence capture: 3
# ⚠ openai — jobs/batch.py:22
# ⚠ anthropic — agents/classifier.py:11
# ⚠ litellm — utils/fallback.py:7
# Coverage: 9/12 (75%)
# STATUS=INCOMPLETE rc=2Add to CI/CD to enforce evidence coverage:
- name: Check LLM evidence coverage
run: aelitium scan ./srcFor CI-friendly key=value output:
aelitium scan ./src --ci
# AELITIUM_SCAN_STATUS=INCOMPLETE
# AELITIUM_SCAN_TOTAL=12
# AELITIUM_SCAN_INSTRUMENTED=9
# AELITIUM_SCAN_MISSING=3
# AELITIUM_SCAN_COVERAGE=75The same input produces the same hash in validated configurations:
bash scripts/verify_repro.sh
# === RESULT: PASS ===
# AI_HASH_SHA256=8b647717...Validated on two independent machines (A + B) with identical hashes.
Tools like Langfuse or Helicone help you debug LLM calls.
AELITIUM helps you verify that recorded evidence was not altered after packing.
Logs can be edited. Evidence-bundle tampering is detectable.
| Tool | What it does |
|---|---|
| Langfuse, Helicone, LangSmith | observability — traces, metrics, dashboards |
| AELITIUM | verification — cryptographic proof the record wasn't altered |
These are complementary, not competing. AELITIUM adds a tamper-evident layer on top of any existing pipeline.
- Detect when recorded responses differ between runs for the same request hash
- Prove recorded evidence wasn't modified after the fact
- Investigate incidents involving AI agents ("what recorded evidence is available for this interaction?")
- Produce verifiable records for compliance or audits (EU AI Act Art.12, SOC 2)
- Enforce evidence coverage in CI/CD (
aelitium scanexits 2 if LLM calls are uninstrumented)
| Command | Description |
|---|---|
scan <path> |
Scan Python files for uninstrumented LLM call sites |
compare <bundle_a> <bundle_b> |
Compare two bundles — detect changed recorded responses |
verify-bundle <dir> |
Verify bundle: hash + signature + binding hash |
pack --input <file> --out <dir> |
Generate canonical JSON + manifest |
verify <dir> |
Verify integrity of a pack output dir |
validate --input <file> |
Validate against ai_output_v1 schema |
canonicalize --input <file> |
Print deterministic hash |
verify-receipt --receipt <file> --pubkey <file> |
Verify Ed25519 authority receipt offline |
export --bundle <dir> |
Export bundle in compliance format (EU AI Act Art.12) |
Exit codes: 0 = success, 2 = failure. Designed for CI/CD pipelines.
See docs/policy/AELITIUM_TRUST_BOUNDARY_SPEC.md for the canonical trust-boundary language policy.
- Why AELITIUM — problem statement, positioning, and what this is for
- Architecture — canonicalization pipeline, evidence bundle, module map
- Security model — threats addressed, guarantees, limitations
- Trust boundary — what AELITIUM proves and what it does not
- 5-minute demo — full walkthrough with expected output
- Python integration — drop-in helper + FastAPI example
- Capture layer — OpenAI adapter, auto-packing, and same-process boundary guidance
- Engine contract — bundle schema and guarantees
- Evidence Bundle Spec — open draft standard for verifiable AI output bundles; AELITIUM is the reference implementation
- Evidence Model — conceptual model, emergent properties, and cross-layer positioning
- AAR evidenceRef mapping — interoperability note: referencing AELITIUM bundles from Agent Action Receipts
- Deterministic — same input produces the same hash in validated configurations
- Offline-first — verification never requires network access
- Fail-closed — any verification error returns
rc=2; no silent failures - Auditable — every pack includes a manifest with schema, timestamp, and hash
- Pipeline-friendly — all output parseable (
STATUS=,AI_HASH_SHA256=,--json)
AELITIUM provides tamper-evidence, not truth guarantees.
What AELITIUM proves:
- the bundle contents have not changed since packing
- the canonicalized payload matches the recorded hash
- (with capture adapter) the request hash matches the payload recorded by the capture path
What AELITIUM does not prove:
- that the model output was correct or safe
- that the system that packed the bundle was trustworthy
- that the model actually produced the output (without capture adapter)
Integrity ≠ completeness. AELITIUM proves that captured events were not altered. It does not guarantee that all events were captured. Capture completeness depends on the integration layer — SDK wrapper, proxy, or observer. If the agent controls its own logging, an observer-based capture pattern provides stronger guarantees. See TRUST_BOUNDARY.md for the full analysis.
Stronger provenance — signing authorities, hardware-backed keys — is the direction of P3.
AELITIUM provides tamper-evident evidence bundles that support the following regulatory and audit requirements:
| Framework | Requirement | How AELITIUM helps |
|---|---|---|
| EU AI Act — Article 12 | Logging and traceability of high-risk AI system outputs | Evidence bundles provide immutable, verifiable records of AI outputs with deterministic hashes |
| SOC 2 — CC7 | System monitoring and integrity controls | Independent offline verification confirms records have not been altered after creation |
| ISO 42001 | AI management system auditability | Canonical bundles with schema versioning support third-party audits without infrastructure access |
| NIST AI RMF — MG 2.2 | Traceability of AI decisions and outputs | Each bundle contains a complete, reproducible record: payload, hash, timestamp, and optional signature |
AELITIUM does not replace logging infrastructure. It adds cryptographic integrity on top of any existing pipeline — offline, without a server, without a blockchain.
Apache-2.0. See LICENSE.