Phase 1: At-Rest Encryption for Vector Embeddings
This guide explains how to configure and use vector encryption in ThemisDB.
Stand: 6. April 2026
Version: v1.3.0
Kategorie: 🔒 Security
Status: ✅ Production Ready
ThemisDB supports at-rest encryption for vector embeddings stored in RocksDB using AES-256-GCM. This feature provides:
- Confidentiality: Vector embeddings encrypted with AES-256
- Integrity: GCM authentication tag prevents tampering
- Key Rotation: Support for multiple key versions
- Backward Compatibility: Reads both encrypted and plaintext vectors
Vector encryption is controlled via configuration in the database:
// Via API
VectorIndexManager vim(db);
vim.setVectorEncryptionEnabled(true);
vim.setVectorKeyId("vector_embeddings");// Via config:vector key in RocksDB
{
"encryption_enabled": true,
"key_id": "vector_embeddings",
"quantization": "none" // Disable quantization when encryption is enabled
}| Option | Type | Default | Description |
|---|---|---|---|
encryption_enabled |
boolean | false |
Enable/disable vector encryption |
key_id |
string | "vector_embeddings" |
Logical key identifier for encryption |
Once encryption is enabled, all new vectors are automatically encrypted:
VectorIndexManager vim(db);
vim.init("documents", 768);
vim.setVectorEncryptionEnabled(true);
// Add vector - automatically encrypted
BaseEntity doc("doc1");
std::vector<float> embedding(768, 0.5f);
doc.setField("embedding", embedding);
vim.addEntity(doc);
// Vector is encrypted in RocksDB storage
// In-memory HNSW index still uses plaintext for searchSearch operates normally - vectors are decrypted automatically:
std::vector<float> query(768, 0.5f);
auto [status, results] = vim.searchKnn(query, 10);
// Results are the same as without encryption
for (const auto& result : results) {
std::cout << "PK: " << result.pk
<< ", Distance: " << result.distance << std::endl;
}VectorIndexManager vim(db);
vim.init("documents", 768);
// Rebuild from storage - automatically decrypts vectors
auto status = vim.rebuildFromStorage();
// Index is now ready for searchUse the migration tool to encrypt existing plaintext vectors:
# Dry run (no changes)
./migrate_vector_encryption \
--db-path /var/lib/themisdb/data \
--object-name documents \
--dry-run
# Actual migration
./migrate_vector_encryption \
--db-path /var/lib/themisdb/data \
--object-name documents \
--batch-size 1000| Option | Required | Description |
|---|---|---|
--db-path |
Yes | Path to RocksDB database |
--object-name |
Yes | Vector index object name (e.g., "documents") |
--key-id |
No | Encryption key ID (default: "vector_embeddings") |
--batch-size |
No | Batch size for migration (default: 1000) |
--dry-run |
No | Simulate migration without making changes |
-
Backup your database
cp -r /var/lib/themisdb/data /var/lib/themisdb/data.backup
-
Run dry-run migration
./migrate_vector_encryption --db-path /var/lib/themisdb/data \ --object-name documents --dry-run -
Review the output
- Check how many vectors will be migrated
- Verify no errors in dry-run
-
Run actual migration
./migrate_vector_encryption --db-path /var/lib/themisdb/data \ --object-name documents -
Enable encryption for new vectors
vim.setVectorEncryptionEnabled(true);
Track encryption operations:
// Log encryption events
THEMIS_INFO("VectorIndexManager: Vector encryption ENABLED");
THEMIS_DEBUG("VectorIndexManager: Encrypted vector for pk={}", pk);
THEMIS_WARN("rebuildFromStorage: Failed to decrypt vector for pk={}: {}", pk, ex.what());All encryption operations are logged for compliance:
- Vector encryption (when adding entities)
- Vector decryption (when rebuilding from storage)
- Encryption errors and failures
- Configuration changes
Plaintext 768-dim vector: 3,072 bytes
Encrypted 768-dim vector: 3,150 bytes (+2.5%)
Components:
- Ciphertext: 3,072 bytes (768 × 4)
- IV: 12 bytes
- Auth tag: 16 bytes
- Metadata: ~50 bytes (key_id, version, base64)
Total: 3,150 bytes
- No impact on search: Vectors are decrypted once during index load
- HNSW search: Operates on plaintext vectors in memory (no decryption overhead)
- Index load time: +40% overhead for decryption (5 seconds for 1M vectors)
- Encryption overhead: ~0.4 ms per vector (768-dim)
- Acceptable for production: Throughput remains high (>1000 vectors/sec)
- Algorithm: AES-256-GCM
- Mode: Galois/Counter Mode (authenticated encryption)
- IV Size: 12 bytes (96 bits, random per encryption)
- Tag Size: 16 bytes (128 bits, authentication tag)
- Keys are managed by the configured
KeyProvider - Supports key rotation with versioning
- Old encrypted vectors can be decrypted with their original key version
Before Encryption:
- ❌ Disk: Plaintext vectors in RocksDB
- ❌ Backups: Plaintext vectors in backup files
After Encryption:
- ✅ Disk: AES-256-GCM encrypted vectors
- ✅ Backups: Encrypted vectors
⚠️ Memory: Plaintext vectors in HNSW index (required for search)
CRY-03 (Data-at-Rest Encryption):
- ✅ Fully Compliant after Phase 1
- Vectors encrypted with AES-256-GCM
- Keys managed by application (not OS-level)
1. Encryption Disabled After Restart
// Solution: Re-enable after server restart
vim.setVectorEncryptionEnabled(true);2. Mixed Encrypted/Plaintext Vectors
This is expected during migration. The system handles both:
// rebuildFromStorage() tries in this order:
// 1. Encrypted vector (embedding_encrypted)
// 2. Lossless compressed vector
// 3. Plaintext vector (embedding)
// 4. SQ8 quantized vector (embedding_q)3. Decryption Failures
Check logs for errors:
grep "Failed to decrypt" /var/log/themisdb/server.logPossible causes:
- Missing or incorrect encryption key
- Corrupted encrypted data
- Wrong key version
4. Performance Degradation
If index load time is too long:
- Consider batch decryption optimization (Phase 2)
- Use encrypted HNSW persistence (Phase 2, Ticket 3)
Enable encryption before adding large datasets:
vim.init("documents", 768);
vim.setVectorEncryptionEnabled(true); // Enable before adding data
// Now add vectors...Always test migration on a staging environment:
# 1. Copy production data to staging
# 2. Run dry-run migration
# 3. Run actual migration
# 4. Verify search works
# 5. Deploy to productionEncryption adds ~2.5% storage overhead. Monitor disk usage:
df -h /var/lib/themisdbBackup before enabling encryption:
# Backup before migration
tar czf themisdb-backup-$(date +%Y%m%d).tar.gz /var/lib/themisdb/data
# Restore if needed
tar xzf themisdb-backup-20251215.tar.gz -C /var/lib/themisdb/Plan for regular key rotation (quarterly):
// Create new key version
key_provider->createKey("vector_embeddings", 2);
// Migrate vectors to new key (future feature)Ticket 3: HNSW Index Encryption
- Encrypt HNSW index files on disk
- Reduce index load time with encrypted persistence
- See
docs/security/HNSW_PERSISTENCE_ENCRYPTION_ANALYSIS.md
Ticket 5: Differential Privacy (3-6 months)
- Add noise to vectors for privacy
- Research-level feature
Ticket 6: Homomorphic Encryption (12 months)
- Search on encrypted vectors without decryption
- Highly experimental
- Phase 1 Implementation Plan
- Phase 1 Status & Next Steps
- HNSW Persistence Encryption Analysis
- BSI C5 Compliance Analysis
- Embedding Reversibility Analysis
Status: Production Ready
Version: 1.0 (Phase 1)
Date: December 15, 2025