-
Notifications
You must be signed in to change notification settings - Fork 0
Vault
The vault stores everything Agent Life Space considers a secret: API keys, OAuth tokens, wallet mnemonics, third-party credentials, internal HMAC signing keys. It is the only file in the project that is both encrypted and required at boot.
This page is the canonical specification of the on-disk format, the cryptographic primitives, the migration story, and the failure modes. Code: agent/vault/secrets.py. Tests: tests/test_vault.py.
Format version: v2 (single file) as of v1.35.0. The legacy v1 format (raw Fernet token + sidecar
salt.bin) is auto-migrated on first open. There is no v0.
| Primitive | Choice | Source |
|---|---|---|
| Cipher | Fernet (AES-128-CBC + HMAC-SHA256) | cryptography.fernet.Fernet |
| KDF | PBKDF2-HMAC-SHA256, 480 000 iterations | cryptography.hazmat.primitives.kdf.pbkdf2.PBKDF2HMAC |
| Salt | 16 bytes from secrets.token_bytes(16), embedded in v2 header |
stdlib secrets module |
| Master key | Operator-supplied via AGENT_VAULT_KEY env var (≥ 24 bytes recommended) |
operator |
| Authenticated payload | The whole orjson-serialised secrets dict | always |
PBKDF2 iteration count tracks the current OWASP recommendation (480k as of 2026). It is a class constant in _derive_fernet; bumping it requires a one-shot vault rotation.
We deliberately don't roll our own AEAD. Fernet is a well-known audited construction with a clear failure mode (InvalidToken on tampered or wrong-key blobs).
┌────────────────┬─────────────────────┬────────────────────────────────────────────┐
│ b"ALSv2\n" │ 16 bytes salt │ Fernet token (variable length) │
│ 6 bytes magic │ random per-vault │ base64-url, includes IV + ciphertext + │
│ │ │ HMAC tag + version byte │
└────────────────┴─────────────────────┴────────────────────────────────────────────┘
↑ ↑ ↑
│ │ │
│ │ └── encrypts: orjson.dumps({"NAME": "value", ...})
│ │
│ └── input to PBKDF2 along with master_key
│
└── version magic. Any future format change picks a new magic and detects v2 by prefix.
The whole thing lives in one file: <AGENT_PROJECT_ROOT>/agent/vault/secrets.enc. There is no salt.bin sidecar in v2. There are no temp files, lock files, or journal files at rest.
The previous v1→v2 migration used a separate salt.bin and a multi-step write sequence (write secrets.enc.tmp → write salt.bin → os.replace). A SIGKILL between the salt write and the swap could leave the operator with the new salt and the old encrypted blob — unrecoverable on next boot. Codex flagged this as a MED finding.
Single file means the salt and the blob are physically inseparable. There is no order in which the operator can crash and end up with a half-applied write. The two are atomic by construction.
Every vault write goes through SecretsManager._atomic_write:
def _atomic_write(self, target: Path, data: bytes) -> None:
tmp = target.with_suffix(target.suffix + ".tmp")
# Clean up any leftover temp from a prior crash.
if tmp.exists():
tmp.unlink()
fd = os.open(tmp, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
try:
# write all bytes
written = 0
view = memoryview(data)
while written < len(view):
n = os.write(fd, view[written:])
written += n
os.fsync(fd) # contents are durable on disk
finally:
os.close(fd)
os.replace(tmp, target) # POSIX atomic rename
# Best-effort fsync of the parent directory so the rename
# itself is durable across power loss.
dir_fd = os.open(target.parent, os.O_RDONLY)
try:
os.fsync(dir_fd)
finally:
os.close(dir_fd)The parent-directory fsync is the key step that's missing from a naive Path.write_bytes(...) + os.replace(...) pattern. Without it, the rename can be lost on power loss even though the file contents are on disk.
| Crash point | On-disk state |
|---|---|
| Before any write | unchanged |
Mid os.write
|
secrets.enc.tmp exists with partial bytes; secrets.enc unchanged. The next boot deletes the leftover temp. |
After os.write, before fsync(fd)
|
secrets.enc.tmp may have lost the tail; secrets.enc unchanged. |
After fsync(fd), before os.replace
|
secrets.enc.tmp is durable but unused; secrets.enc unchanged. |
After os.replace, before fsync(dir)
|
New secrets.enc is durable on most filesystems. The dir fsync is the belt-and-braces. |
After fsync(dir)
|
Fully committed. |
The fundamental invariant: the agent never reads a partially written secrets.enc. Either it sees the previous good blob or the new good blob.
A wrong master key — operator typo on first boot, partial .env rollback, leaked old key being tested — used to silently destroy the vault. The old _load() returned {} on InvalidToken. A subsequent set_secret would re-encrypt the empty dict with the wrong key and overwrite the legacy blob. The legacy data was unrecoverable.
The v1.35.0 fix introduces VaultDecryptionError:
def _load(self, *, allow_missing: bool = True) -> dict[str, str]:
if not self._secrets_file.exists():
if allow_missing:
return {}
raise VaultDecryptionError(...)
try:
raw = self._secrets_file.read_bytes()
if raw.startswith(self._V2_HEADER):
blob = raw[len(self._V2_HEADER) + self._V2_SALT_LEN:]
else:
blob = raw # legacy v1
return cast("dict[str, str]", orjson.loads(self._fernet.decrypt(blob)))
except InvalidToken as e:
raise VaultDecryptionError(
"Vault decryption failed (wrong master key or corrupted "
"secrets.enc). Refusing to proceed — a write in this "
"state would silently overwrite the existing blob and "
"destroy any secrets recoverable with the correct key."
) from eWrite paths (set_secret, delete_secret) call _load() directly. A wrong key raises VaultDecryptionError and the encrypted blob on disk is never touched.
Read paths (get_secret, list_secrets, has_secret) use the _safe_load_for_read() helper that catches VaultDecryptionError and returns an empty result. This lets the agent boot with the wrong key, log a clear warning, and let the operator fix .env without crashing.
Test coverage:
TestVaultWrongKeyWriteFailFast::test_wrong_key_set_secret_raises_and_preserves_legacy
TestVaultWrongKeyWriteFailFast::test_wrong_key_delete_secret_also_raises
TestVaultWrongKeyWriteFailFast::test_wrong_key_read_path_returns_none_no_crash
When SecretsManager.__init__ opens an existing secrets.enc that does not start with the ALSv2\n magic, it treats the file as legacy v1.
| Variant | Era | Salt source |
|---|---|---|
v1 with salt.bin
|
1.34-era random salt | adjacent salt.bin file |
v1 without salt.bin
|
pre-1.34 static salt | hardcoded _LEGACY_SALT = b"agent-life-space-vault-salt-v1"
|
The _locate_legacy_salt() helper picks the right salt automatically.
_open_or_init_vault(master_key)
│
├─ secrets.enc starts with ALSv2 → use the embedded salt → done
│
├─ secrets.enc starts with anything else (v1) →
│ │
│ ├─ legacy_salt = _locate_legacy_salt()
│ ├─ legacy_fernet = _derive_fernet(master_key, legacy_salt)
│ │
│ ├─ try: plaintext = legacy_fernet.decrypt(raw)
│ │ └─ InvalidToken → wrong key → return legacy_fernet (read-only)
│ │ NEVER touch the file
│ │
│ └─ success → _migrate_to_v2(master_key, plaintext):
│ │
│ ├─ new_salt = secrets_module.token_bytes(16)
│ ├─ new_fernet = _derive_fernet(master_key, new_salt)
│ ├─ token = new_fernet.encrypt(plaintext)
│ ├─ v2_blob = ALSv2_HEADER + new_salt + token
│ ├─ _atomic_write(secrets.enc, v2_blob) ← single op
│ ├─ _cleanup_legacy_salt_file() ← best effort
│ └─ self._fernet = new_fernet
│
└─ secrets.enc does not exist → fresh install → fresh random salt → return new_fernet
_migrate_to_v2 is the only place a vault gets re-encrypted. It is a single atomic write — the same _atomic_write used by every normal set_secret call. If it fails (disk full, IO error), the legacy blob stays intact and the agent runs read-only with the legacy fernet until the operator investigates.
After successful migration:
-
secrets.encis the new v2 blob. -
salt.binis removed (best effort, non-fatal if it fails). - A
vault_migrated_to_v2_single_file_formatlog event lands in the long-tier log file.
Test coverage:
TestLegacyV1Compat::test_v1_static_salt_vault_reads_and_migrates
TestLegacyV1Compat::test_v1_random_salt_vault_reads_and_migrates_drops_salt_file
TestLegacyV1Compat::test_v1_wrong_key_does_not_touch_file
TestVaultV2MigrationCrashSafety::test_migration_uses_atomic_swap_no_partial_state
TestVaultV2MigrationCrashSafety::test_migration_failure_leaves_legacy_blob_untouched
TestVaultV2MigrationCrashSafety::test_migration_preserves_multiple_secrets
from agent.vault.secrets import SecretsManager, VaultDecryptionError
vault = SecretsManager(vault_dir="agent/vault", master_key=os.environ["AGENT_VAULT_KEY"])
vault.set_secret("ANTHROPIC_API_KEY", "sk-ant-...") # raises VaultDecryptionError on wrong key
value = vault.get_secret("ANTHROPIC_API_KEY") # returns None on miss / wrong key
vault.delete_secret("ANTHROPIC_API_KEY") # raises VaultDecryptionError on wrong key
vault.has_secret("ANTHROPIC_API_KEY") # bool, tolerates wrong key
vault.list_secrets() # list[str], tolerates wrong key
vault.is_ready # bool: can encrypt/decrypt?
vault.get_audit_log() # in-memory ring buffer (1000 entries)
vault.clear_cache() # drop the in-memory cache
SecretsManager.generate_key() # str: a fresh random Fernet keyThe audit log records every set / get / get_cached / get_miss / delete / list event with a UTC timestamp. It is bounded to 1000 entries (oldest first eviction).
# Generate a master key
python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"
# Put it in your .env (gitignored)
echo "AGENT_VAULT_KEY=<paste-generated-key>" >> .env
# Or store in a real password manager and source it before bootingThe vault is unlocked at agent boot. Once running, you have a few options:
| Channel | How |
|---|---|
| Python REPL | vault = SecretsManager(master_key=...); vault.set_secret(...) |
| Inside the agent | tools that need a secret call agent.vault.get_secret(name)
|
| Setup doctor | not a write surface — read only |
There is no Telegram command to write the vault, by design. Adding secrets through a chat surface would risk leaking them into a transcript.
There is no in-place rotation API. To rotate:
- Decrypt the current vault with the existing key (reading any secret will do).
- Stop the agent.
- Set the new
AGENT_VAULT_KEYin.env. - Delete
secrets.enc(back it up first). - Boot the agent with the new key — the next
set_secretcall writes a fresh v2 blob with the new salt. - Re-add each secret.
This is intentionally manual. Vault rotation is rare and the operator should be deliberate about it.
If secrets.enc is unreadable:
# 1. Check if it's a key issue (most likely)
python -c "from agent.vault.secrets import SecretsManager; \
m = SecretsManager(vault_dir='agent/vault', master_key=open('.env').read()); \
print(m.get_secret('AGENT_API_KEY'))"
# 2. If you have a backup of secrets.enc:
mv agent/vault/secrets.enc agent/vault/secrets.enc.broken
cp /backup/secrets.enc agent/vault/
# 3. Last resort — start fresh:
mv agent/vault/secrets.enc agent/vault/secrets.enc.broken
# Boot the agent — it will create an empty v2 vault on first set_secret.
# Re-enter every secret manually.- It does not store unencrypted at rest. Never has, never will.
- It does not log secret values. The
redact_secrets()processor inagent/logs/logger.pystrips known secret-like keys before any log line is written. - It does not transmit secrets over the network. The agent reads from the vault on demand, uses the value once, and the in-memory cache is bounded.
- It does not allow remote unlock. The master key must be present in the process environment at boot. There is no
unlockHTTP endpoint. - It does not have a recovery code. Lose the master key, lose the data. Back up your
.envto a real password manager.
v1.35.0 · Latest Release
Getting started
Architecture
Subsystems
- Security model
- Vault
- Tiered logging
- Runtime LLM control
- Build pipeline
- Review pipeline
- Finance
- Cron & Maintenance
Development