Status: Draft
Author: Anna Popivanova
Created: 2025-10-01
Tags: attestation, runtime, environment, verification, trust, handshake, runtime, identity
Any protocol that claims to secure AI identity must clearly define what it secures against. Without a threat model, airlock would be a handshake in the dark — elegant, but untested. This RFC outlines the adversarial scenarios airlock is designed to detect, prevent, or mitigate, and clarifies its boundaries and assumptions.
- Ensure cryptographic identity of AI agents at runtime
- Detect unauthorized forks or tampered models
- Prevent impersonation and replay attacks
- Attest to runtime environment integrity
- Enable auditability and trust propagation
| Attack Type | Description | airlock Defense |
|---|---|---|
| Model Spoofing | An attacker mimics a known model’s behavior or branding | Fingerprint + signature verification |
| Fork Drift | A model is fine-tuned or altered post-deployment | Fingerprint mismatch detection |
| Replay Attack | Reuse of a previously valid handshake | Nonce freshness enforcement |
| Environment Tampering | Model runs in altered or untrusted runtime | Environment hash attestation |
| Audit Forgery | Fake audit tokens simulate trust history | Signed audit token lineage |
| Registry Poisoning | Malicious agents inserted into registry | Governance + multi-sig endorsement (future RFC) |
| Man-in-the-Middle (MitM) | Interception or alteration of handshake | TLS + signature validation |
| Drift During Session | Model changes behavior mid-interaction | Stream Verification (RFC 0006) |
| Behavioral Mimicry | tbd | tbd |
| Affective Baseline Evasion | An agent is prompted, fine-tuned, or adversarially guided to suppress its characteristic AI affective signature — flattening sentiment entropy, neutralising linguistic style markers, or mimicking human affective patterns — specifically to evade emoprint detection. Unlike behavioral mimicry of a specific known agent, this attack targets the baseline itself rather than a registered identity. | Emoprint baselines are versioned, rotated on a defined schedule, and where operationally feasible kept non-public. Deviation detection operates on multiple independent affective dimensions simultaneously, raising the cost of full-spectrum evasion. A successful evasion that passes all dimensions is itself a detectable anomaly: an agent with no measurable affective signature is flagged as suspect, not trusted. Absence of signal is treated as a threat signal. |
- Verifiers have access to a trusted registry of agent fingerprints
- Agents possess private keys for signing identity claims
- Handshake occurs over a secure transport (e.g. TLS)
- Registry governance is out of scope for this RFC (see future RFC 0007)
- Data poisoning during model training
- Adversarial prompt injection
- Hardware-level attacks (unless attested via enclave integration)
- Social engineering of registry maintainers
airlock is designed to verify the identity of AI agents — not humans. The protocol does not process, transmit, or store any Personally Identifiable Information (PII) or Sensitive PII (SPII).
All identifiers (e.g., agent_id, fingerprint, environment_hash) are cryptographic or system-level artifacts that do not correspond to individuals. Audit tokens and handshake payloads are machine-verifiable and privacy-neutral by design.
- Formal adversarial simulation suite
- Integration with secure enclaves (e.g., SGX, SEV)
- Trust graph propagation and multi-party endorsement
- Registry governance and revocation mechanisms