Skip to content

aelitium-dev/aelitium-v3

Repository files navigation

AELITIUM

Git-style verification for LLM outputs.

License tests python

AELITIUM is a library/CLI for producing and verifying tamper-evident, offline-verifiable evidence bundles for recorded LLM interactions under deterministic canonicalization.

LLM outputs can change silently. AELITIUM currently enforces fail-closed verification semantics on the validated surface and verifies whether recorded evidence has been modified after packing.

Quickstart

Find uncaptured LLM call sites:

aelitium scan .

Capture evidence:

from aelitium import enable_litellm
enable_litellm()

Verify a bundle offline:

aelitium verify-bundle ./bundle

What it proves

  • Recorded request and recorded response artifacts can be cryptographically bound
  • Post-hoc modification of canonicalized recorded artifacts is detectable
  • Verification can be performed offline on the validated surface

What it does not prove

  • That the model actually executed
  • That the provider was honest
  • That the response is correct or truthful
  • That capture was complete
  • That semantic equivalence implies hash equivalence

The problem

You run the same prompt in production. One week later, the output is different.

The recorded response changed — but your logs just show two JSON blobs. It is hard to verify whether the recorded evidence was modified after packing.


Try it offline

git clone https://github.com/aelitium-dev/aelitium-v3
cd aelitium-v3 && pip install -e .
bash examples/drift_demo/run_demo.sh  # no API key required

Same request hash. Different recorded response hash. That means the recorded response changed for the compared bundles.

# Scan your codebase for unprotected LLM calls:
aelitium scan ./src
# LLM call sites detected: 4
# Missing evidence capture:
#   ⚠ openai — worker.py:42
#   ⚠ anthropic — agent.py:17
# Coverage: 2/4 (50%)
# STATUS=INCOMPLETE rc=2

All commands accept --json for structured output.


How it works

API call (OpenAI / Anthropic / LiteLLM)
      ↓
capture adapter   ← records request_hash + response_hash in-process
      ↓
evidence bundle   ← canonical JSON + ai_manifest.json + binding_hash
      ↓
aelitium verify-bundle   ← STATUS=VALID / STATUS=INVALID
aelitium compare         ← UNCHANGED / CHANGED / NOT_COMPARABLE

Each bundle contains a deterministic SHA-256 hash of the payload, a manifest with timestamp and schema, and a cryptographic binding_hash linking the recorded request and recorded response artifacts. Anyone with the bundle can verify it — no network required.

Current binding construction:

binding_hash = SHA256(
  canonical({
    "request_hash": request_hash,
    "response_hash": response_hash
  })
)

Capture adapter (OpenAI / Anthropic / LiteLLM)

No manual JSON. The capture adapter intercepts the API call and writes the bundle automatically.

from openai import OpenAI
from aelitium import capture_openai

client = OpenAI()
result = capture_openai(
    client, "gpt-4o",
    [{"role": "user", "content": "What is the capital of France?"}],
    out_dir="./evidence",
)
print(result.ai_hash_sha256)  # deterministic hash for this recorded request/response pair
aelitium verify-bundle ./evidence
# STATUS=VALID rc=0
# AI_HASH_SHA256=...
# BINDING_HASH=...   ← cryptographic link between request and response

LiteLLM routes to any provider — one adapter covers all:

from aelitium import capture_litellm

result = capture_litellm(
    model="openai/gpt-4o",           # or "anthropic/...", "bedrock/...", etc.
    messages=[{"role": "user", "content": "What is the capital of France?"}],
    out_dir="./evidence",
)
print(result.ai_hash_sha256)

See Capture layer for Anthropic, LiteLLM, streaming, and signing.


Zero-config with LiteLLM

Add one line. Keep using LiteLLM normally.

from aelitium import enable_litellm
import litellm

enable_litellm(out_dir="./aelitium/bundles", verbose=True)

response = litellm.completion(
    model="openai/gpt-4o",
    messages=[{"role": "user", "content": "Hello"}],
)

print(response.choices[0].message.content)
# AELITIUM: bundle → ./aelitium/bundles/<binding_hash>  binding_hash=<hash>

Every call writes a bundle automatically. The LLM response is unchanged.

What you get:

  • request_hash — deterministic hash of the recorded request payload
  • response_hash — hash of the recorded response content
  • binding_hash — cryptographic link between the two

Failure modes:

Mode Capture fails Streaming
strict=False (default) warning, response returned pass-through
strict=True raises raises
enable_litellm(strict=True)  # capture failure raises instead of warning

Notes:

  • Streaming calls (stream=True) are not captured — they pass through unchanged

See examples/litellm_enable.py for a runnable example.


Detect when the recorded response changed

aelitium compare ./bundle_last_week ./bundle_today
# STATUS=CHANGED rc=2
# REQUEST_HASH=SAME    a=3f4a8c1d... b=3f4a8c1d...
# RESPONSE_HASH=DIFFERENT  a=9b2e7f1a... b=c41d8e3b...
# INTERPRETATION=Same request_hash with different response_hash observed

If REQUEST_HASH=SAME and RESPONSE_HASH=DIFFERENT, the compared bundles contain different recorded responses for the same hashed request. AELITIUM does not attribute the cause.

Run offline (no API key):

bash examples/drift_demo/run_demo.sh

Or with a real OpenAI key:

python examples/model_drift_detector.py

Scan for unprotected LLM calls

Find every LLM call in your codebase that isn't wrapped in a capture adapter:

aelitium scan ./src

# LLM call sites detected: 12
# Instrumented with capture adapter: 9
#   ✓ openai — api/worker.py:14
#   ✓ openai — api/worker.py:38
# Missing evidence capture: 3
#   ⚠ openai — jobs/batch.py:22
#   ⚠ anthropic — agents/classifier.py:11
#   ⚠ litellm — utils/fallback.py:7
# Coverage: 9/12 (75%)
# STATUS=INCOMPLETE rc=2

Add to CI/CD to enforce evidence coverage:

- name: Check LLM evidence coverage
  run: aelitium scan ./src

For CI-friendly key=value output:

aelitium scan ./src --ci
# AELITIUM_SCAN_STATUS=INCOMPLETE
# AELITIUM_SCAN_TOTAL=12
# AELITIUM_SCAN_INSTRUMENTED=9
# AELITIUM_SCAN_MISSING=3
# AELITIUM_SCAN_COVERAGE=75

Reproducibility

The same input produces the same hash in validated configurations:

bash scripts/verify_repro.sh
# === RESULT: PASS ===
# AI_HASH_SHA256=8b647717...

Validated on two independent machines (A + B) with identical hashes.


Why logs are not enough

Tools like Langfuse or Helicone help you debug LLM calls.

AELITIUM helps you verify that recorded evidence was not altered after packing.

Logs can be edited. Evidence-bundle tampering is detectable.

Tool What it does
Langfuse, Helicone, LangSmith observability — traces, metrics, dashboards
AELITIUM verification — cryptographic proof the record wasn't altered

These are complementary, not competing. AELITIUM adds a tamper-evident layer on top of any existing pipeline.


When teams use AELITIUM

  • Detect when recorded responses differ between runs for the same request hash
  • Prove recorded evidence wasn't modified after the fact
  • Investigate incidents involving AI agents ("what recorded evidence is available for this interaction?")
  • Produce verifiable records for compliance or audits (EU AI Act Art.12, SOC 2)
  • Enforce evidence coverage in CI/CD (aelitium scan exits 2 if LLM calls are uninstrumented)

CLI reference

aelitium

Command Description
scan <path> Scan Python files for uninstrumented LLM call sites
compare <bundle_a> <bundle_b> Compare two bundles — detect changed recorded responses
verify-bundle <dir> Verify bundle: hash + signature + binding hash
pack --input <file> --out <dir> Generate canonical JSON + manifest
verify <dir> Verify integrity of a pack output dir
validate --input <file> Validate against ai_output_v1 schema
canonicalize --input <file> Print deterministic hash
verify-receipt --receipt <file> --pubkey <file> Verify Ed25519 authority receipt offline
export --bundle <dir> Export bundle in compliance format (EU AI Act Art.12)

Exit codes: 0 = success, 2 = failure. Designed for CI/CD pipelines.


Policy

See docs/policy/AELITIUM_TRUST_BOUNDARY_SPEC.md for the canonical trust-boundary language policy.

Documentation


Design principles

  • Deterministic — same input produces the same hash in validated configurations
  • Offline-first — verification never requires network access
  • Fail-closed — any verification error returns rc=2; no silent failures
  • Auditable — every pack includes a manifest with schema, timestamp, and hash
  • Pipeline-friendly — all output parseable (STATUS=, AI_HASH_SHA256=, --json)

Trust boundary

AELITIUM provides tamper-evidence, not truth guarantees.

What AELITIUM proves:

  • the bundle contents have not changed since packing
  • the canonicalized payload matches the recorded hash
  • (with capture adapter) the request hash matches the payload recorded by the capture path

What AELITIUM does not prove:

  • that the model output was correct or safe
  • that the system that packed the bundle was trustworthy
  • that the model actually produced the output (without capture adapter)

Integrity ≠ completeness. AELITIUM proves that captured events were not altered. It does not guarantee that all events were captured. Capture completeness depends on the integration layer — SDK wrapper, proxy, or observer. If the agent controls its own logging, an observer-based capture pattern provides stronger guarantees. See TRUST_BOUNDARY.md for the full analysis.

Stronger provenance — signing authorities, hardware-backed keys — is the direction of P3.


Compliance alignment

AELITIUM provides tamper-evident evidence bundles that support the following regulatory and audit requirements:

Framework Requirement How AELITIUM helps
EU AI Act — Article 12 Logging and traceability of high-risk AI system outputs Evidence bundles provide immutable, verifiable records of AI outputs with deterministic hashes
SOC 2 — CC7 System monitoring and integrity controls Independent offline verification confirms records have not been altered after creation
ISO 42001 AI management system auditability Canonical bundles with schema versioning support third-party audits without infrastructure access
NIST AI RMF — MG 2.2 Traceability of AI decisions and outputs Each bundle contains a complete, reproducible record: payload, hash, timestamp, and optional signature

AELITIUM does not replace logging infrastructure. It adds cryptographic integrity on top of any existing pipeline — offline, without a server, without a blockchain.


License

Apache-2.0. See LICENSE.

About

Cryptographic integrity for AI outputs. Pack, verify, and detect tampering — offline and deterministic.

Topics

Resources

License

Contributing

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors