AELITIUM

Git-style verification for LLM outputs.

AELITIUM is a library/CLI for producing and verifying tamper-evident, offline-verifiable evidence bundles for recorded LLM interactions under deterministic canonicalization.

LLM outputs can change silently. AELITIUM currently enforces fail-closed verification semantics on the validated surface and verifies whether recorded evidence has been modified after packing.

Quickstart

Find uncaptured LLM call sites:

aelitium scan .

Capture evidence:

from aelitium import enable_litellm
enable_litellm()

Verify a bundle offline:

aelitium verify-bundle ./bundle

What it proves

Recorded request and recorded response artifacts can be cryptographically bound
Post-hoc modification of canonicalized recorded artifacts is detectable
Verification can be performed offline on the validated surface

What it does not prove

That the model actually executed
That the provider was honest
That the response is correct or truthful
That capture was complete
That semantic equivalence implies hash equivalence

The problem

You run the same prompt in production. One week later, the output is different.

The recorded response changed — but your logs just show two JSON blobs. It is hard to verify whether the recorded evidence was modified after packing.

Try it offline

git clone https://github.com/aelitium-dev/aelitium-v3
cd aelitium-v3 && pip install -e .
bash examples/drift_demo/run_demo.sh  # no API key required

Same request hash. Different recorded response hash. That means the recorded response changed for the compared bundles.

# Scan your codebase for unprotected LLM calls:
aelitium scan ./src
# LLM call sites detected: 4
# Missing evidence capture:
#   ⚠ openai — worker.py:42
#   ⚠ anthropic — agent.py:17
# Coverage: 2/4 (50%)
# STATUS=INCOMPLETE rc=2

All commands accept --json for structured output.

How it works

API call (OpenAI / Anthropic / LiteLLM)
      ↓
capture adapter   ← records request_hash + response_hash in-process
      ↓
evidence bundle   ← canonical JSON + ai_manifest.json + binding_hash
      ↓
aelitium verify-bundle   ← STATUS=VALID / STATUS=INVALID
aelitium compare         ← UNCHANGED / CHANGED / NOT_COMPARABLE

Each bundle contains a deterministic SHA-256 hash of the payload, a manifest with timestamp and schema, and a cryptographic binding_hash linking the recorded request and recorded response artifacts. Anyone with the bundle can verify it — no network required.

Current binding construction:

binding_hash = SHA256(
  canonical({
    "request_hash": request_hash,
    "response_hash": response_hash
  })
)

Capture adapter (OpenAI / Anthropic / LiteLLM)

No manual JSON. The capture adapter intercepts the API call and writes the bundle automatically.

from openai import OpenAI
from aelitium import capture_openai

client = OpenAI()
result = capture_openai(
    client, "gpt-4o",
    [{"role": "user", "content": "What is the capital of France?"}],
    out_dir="./evidence",
)
print(result.ai_hash_sha256)  # deterministic hash for this recorded request/response pair

aelitium verify-bundle ./evidence
# STATUS=VALID rc=0
# AI_HASH_SHA256=...
# BINDING_HASH=...   ← cryptographic link between request and response

LiteLLM routes to any provider — one adapter covers all:

from aelitium import capture_litellm

result = capture_litellm(
    model="openai/gpt-4o",           # or "anthropic/...", "bedrock/...", etc.
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    out_dir="./evidence",
)
print(result.ai_hash_sha256)

See Capture layer for Anthropic, LiteLLM, streaming, and signing.

Zero-config with LiteLLM

Add one line. Keep using LiteLLM normally.

from aelitium import enable_litellm
import litellm

enable_litellm(out_dir="./aelitium/bundles", verbose=True)

response = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.choices[0].message.content)
# AELITIUM: bundle → ./aelitium/bundles/<binding_hash>  binding_hash=<hash>

Every call writes a bundle automatically. The LLM response is unchanged.

What you get:

request_hash — deterministic hash of the recorded request payload
response_hash — hash of the recorded response content
binding_hash — cryptographic link between the two

Failure modes:

Mode	Capture fails	Streaming
`strict=False` (default)	warning, response returned	pass-through
`strict=True`	raises	raises

enable_litellm(strict=True)  # capture failure raises instead of warning

Notes:

Streaming calls (stream=True) are not captured — they pass through unchanged

See examples/litellm_enable.py for a runnable example.

Detect when the recorded response changed

aelitium compare ./bundle_last_week ./bundle_today
# STATUS=CHANGED rc=2
# REQUEST_HASH=SAME    a=3f4a8c1d... b=3f4a8c1d...
# RESPONSE_HASH=DIFFERENT  a=9b2e7f1a... b=c41d8e3b...
# INTERPRETATION=Same request_hash with different response_hash observed

If REQUEST_HASH=SAME and RESPONSE_HASH=DIFFERENT, the compared bundles contain different recorded responses for the same hashed request. AELITIUM does not attribute the cause.

Run offline (no API key):

bash examples/drift_demo/run_demo.sh

Or with a real OpenAI key:

python examples/model_drift_detector.py

Scan for unprotected LLM calls

Find every LLM call in your codebase that isn't wrapped in a capture adapter:

aelitium scan ./src

# LLM call sites detected: 12
# Instrumented with capture adapter: 9
#   ✓ openai — api/worker.py:14
#   ✓ openai — api/worker.py:38
# Missing evidence capture: 3
#   ⚠ openai — jobs/batch.py:22
#   ⚠ anthropic — agents/classifier.py:11
#   ⚠ litellm — utils/fallback.py:7
# Coverage: 9/12 (75%)
# STATUS=INCOMPLETE rc=2

Add to CI/CD to enforce evidence coverage:

- name: Check LLM evidence coverage
  run: aelitium scan ./src

For CI-friendly key=value output:

aelitium scan ./src --ci
# AELITIUM_SCAN_STATUS=INCOMPLETE
# AELITIUM_SCAN_TOTAL=12
# AELITIUM_SCAN_INSTRUMENTED=9
# AELITIUM_SCAN_MISSING=3
# AELITIUM_SCAN_COVERAGE=75

Reproducibility

The same input produces the same hash in validated configurations:

bash scripts/verify_repro.sh
# === RESULT: PASS ===
# AI_HASH_SHA256=8b647717...

Validated on two independent machines (A + B) with identical hashes.

Why logs are not enough

Tools like Langfuse or Helicone help you debug LLM calls.

AELITIUM helps you verify that recorded evidence was not altered after packing.

Logs can be edited. Evidence-bundle tampering is detectable.

Tool	What it does
Langfuse, Helicone, LangSmith	observability — traces, metrics, dashboards
AELITIUM	verification — cryptographic proof the record wasn't altered

These are complementary, not competing. AELITIUM adds a tamper-evident layer on top of any existing pipeline.

When teams use AELITIUM

Detect when recorded responses differ between runs for the same request hash
Prove recorded evidence wasn't modified after the fact
Investigate incidents involving AI agents ("what recorded evidence is available for this interaction?")
Produce verifiable records for compliance or audits (EU AI Act Art.12, SOC 2)
Enforce evidence coverage in CI/CD (aelitium scan exits 2 if LLM calls are uninstrumented)

CLI reference

`aelitium`

Command	Description
`scan <path>`	Scan Python files for uninstrumented LLM call sites
`compare <bundle_a> <bundle_b>`	Compare two bundles — detect changed recorded responses
`verify-bundle <dir>`	Verify bundle: hash + signature + binding hash
`pack --input <file> --out <dir>`	Generate canonical JSON + manifest
`verify <dir>`	Verify integrity of a pack output dir
`validate --input <file>`	Validate against `ai_output_v1` schema
`canonicalize --input <file>`	Print deterministic hash
`verify-receipt --receipt <file> --pubkey <file>`	Verify Ed25519 authority receipt offline
`export --bundle <dir>`	Export bundle in compliance format (EU AI Act Art.12)

Exit codes: 0 = success, 2 = failure. Designed for CI/CD pipelines.

Policy

See docs/policy/AELITIUM_TRUST_BOUNDARY_SPEC.md for the canonical trust-boundary language policy.

Documentation

Why AELITIUM — problem statement, positioning, and what this is for
Architecture — canonicalization pipeline, evidence bundle, module map
Security model — threats addressed, guarantees, limitations
Trust boundary — what AELITIUM proves and what it does not
5-minute demo — full walkthrough with expected output
Python integration — drop-in helper + FastAPI example
Capture layer — OpenAI adapter, auto-packing, and same-process boundary guidance
Engine contract — bundle schema and guarantees
Evidence Bundle Spec — open draft standard for verifiable AI output bundles; AELITIUM is the reference implementation
Evidence Model — conceptual model, emergent properties, and cross-layer positioning
AAR evidenceRef mapping — interoperability note: referencing AELITIUM bundles from Agent Action Receipts

Design principles

Deterministic — same input produces the same hash in validated configurations
Offline-first — verification never requires network access
Fail-closed — any verification error returns rc=2; no silent failures
Auditable — every pack includes a manifest with schema, timestamp, and hash
Pipeline-friendly — all output parseable (STATUS=, AI_HASH_SHA256=, --json)

Trust boundary

AELITIUM provides tamper-evidence, not truth guarantees.

What AELITIUM proves:

the bundle contents have not changed since packing
the canonicalized payload matches the recorded hash
(with capture adapter) the request hash matches the payload recorded by the capture path

What AELITIUM does not prove:

that the model output was correct or safe
that the system that packed the bundle was trustworthy
that the model actually produced the output (without capture adapter)

Integrity ≠ completeness. AELITIUM proves that captured events were not altered. It does not guarantee that all events were captured. Capture completeness depends on the integration layer — SDK wrapper, proxy, or observer. If the agent controls its own logging, an observer-based capture pattern provides stronger guarantees. See TRUST_BOUNDARY.md for the full analysis.

Stronger provenance — signing authorities, hardware-backed keys — is the direction of P3.

Compliance alignment

AELITIUM provides tamper-evident evidence bundles that support the following regulatory and audit requirements:

Framework	Requirement	How AELITIUM helps
EU AI Act — Article 12	Logging and traceability of high-risk AI system outputs	Evidence bundles provide immutable, verifiable records of AI outputs with deterministic hashes
SOC 2 — CC7	System monitoring and integrity controls	Independent offline verification confirms records have not been altered after creation
ISO 42001	AI management system auditability	Canonical bundles with schema versioning support third-party audits without infrastructure access
NIST AI RMF — MG 2.2	Traceability of AI decisions and outputs	Each bundle contains a complete, reproducible record: payload, hash, timestamp, and optional signature

AELITIUM does not replace logging infrastructure. It adds cryptographic integrity on top of any existing pipeline — offline, without a server, without a blockchain.

License

Apache-2.0. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 215 Commits
.github		.github
aelitium		aelitium
artifacts/real_input		artifacts/real_input
audit_results		audit_results
docs		docs
engine		engine
examples		examples
governance/logs		governance/logs
inputs		inputs
p3		p3
scripts		scripts
tests		tests
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
FEATURE_MATRIX.md		FEATURE_MATRIX.md
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
SECURITY.md		SECURITY.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AELITIUM

Quickstart

What it proves

What it does not prove

The problem

Try it offline

How it works

Capture adapter (OpenAI / Anthropic / LiteLLM)

Zero-config with LiteLLM

Detect when the recorded response changed

Scan for unprotected LLM calls

Reproducibility

Why logs are not enough

When teams use AELITIUM

CLI reference

`aelitium`

Policy

Documentation

Design principles

Trust boundary

Compliance alignment

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

AELITIUM

Quickstart

What it proves

What it does not prove

The problem

Try it offline

How it works

Capture adapter (OpenAI / Anthropic / LiteLLM)

Zero-config with LiteLLM

Detect when the recorded response changed

Scan for unprotected LLM calls

Reproducibility

Why logs are not enough

When teams use AELITIUM

CLI reference

aelitium

Policy

Documentation

Design principles

Trust boundary

Compliance alignment

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`aelitium`

Packages