Skip to content

Latest commit

 

History

History
397 lines (276 loc) · 9.05 KB

File metadata and controls

397 lines (276 loc) · 9.05 KB

Vector Encryption Configuration Guide

Phase 1: At-Rest Encryption for Vector Embeddings

This guide explains how to configure and use vector encryption in ThemisDB.

Stand: 6. April 2026
Version: v1.3.0
Kategorie: 🔒 Security
Status: ✅ Production Ready


📑 Table of Contents

Overview

ThemisDB supports at-rest encryption for vector embeddings stored in RocksDB using AES-256-GCM. This feature provides:

  • Confidentiality: Vector embeddings encrypted with AES-256
  • Integrity: GCM authentication tag prevents tampering
  • Key Rotation: Support for multiple key versions
  • Backward Compatibility: Reads both encrypted and plaintext vectors

Configuration

Enable Vector Encryption

Vector encryption is controlled via configuration in the database:

// Via API
VectorIndexManager vim(db);
vim.setVectorEncryptionEnabled(true);
vim.setVectorKeyId("vector_embeddings");
// Via config:vector key in RocksDB
{
  "encryption_enabled": true,
  "key_id": "vector_embeddings",
  "quantization": "none"  // Disable quantization when encryption is enabled
}

Configuration Options

Option Type Default Description
encryption_enabled boolean false Enable/disable vector encryption
key_id string "vector_embeddings" Logical key identifier for encryption

Usage

Adding Encrypted Vectors

Once encryption is enabled, all new vectors are automatically encrypted:

VectorIndexManager vim(db);
vim.init("documents", 768);
vim.setVectorEncryptionEnabled(true);

// Add vector - automatically encrypted
BaseEntity doc("doc1");
std::vector<float> embedding(768, 0.5f);
doc.setField("embedding", embedding);
vim.addEntity(doc);

// Vector is encrypted in RocksDB storage
// In-memory HNSW index still uses plaintext for search

Searching Encrypted Vectors

Search operates normally - vectors are decrypted automatically:

std::vector<float> query(768, 0.5f);
auto [status, results] = vim.searchKnn(query, 10);

// Results are the same as without encryption
for (const auto& result : results) {
    std::cout << "PK: " << result.pk 
              << ", Distance: " << result.distance << std::endl;
}

Rebuilding Index from Encrypted Storage

VectorIndexManager vim(db);
vim.init("documents", 768);

// Rebuild from storage - automatically decrypts vectors
auto status = vim.rebuildFromStorage();

// Index is now ready for search

Migration

Migrating Existing Data

Use the migration tool to encrypt existing plaintext vectors:

# Dry run (no changes)
./migrate_vector_encryption \
  --db-path /var/lib/themisdb/data \
  --object-name documents \
  --dry-run

# Actual migration
./migrate_vector_encryption \
  --db-path /var/lib/themisdb/data \
  --object-name documents \
  --batch-size 1000

Migration Options

Option Required Description
--db-path Yes Path to RocksDB database
--object-name Yes Vector index object name (e.g., "documents")
--key-id No Encryption key ID (default: "vector_embeddings")
--batch-size No Batch size for migration (default: 1000)
--dry-run No Simulate migration without making changes

Migration Steps

  1. Backup your database

    cp -r /var/lib/themisdb/data /var/lib/themisdb/data.backup
  2. Run dry-run migration

    ./migrate_vector_encryption --db-path /var/lib/themisdb/data \
                                 --object-name documents --dry-run
  3. Review the output

    • Check how many vectors will be migrated
    • Verify no errors in dry-run
  4. Run actual migration

    ./migrate_vector_encryption --db-path /var/lib/themisdb/data \
                                 --object-name documents
  5. Enable encryption for new vectors

    vim.setVectorEncryptionEnabled(true);

Monitoring

Metrics

Track encryption operations:

// Log encryption events
THEMIS_INFO("VectorIndexManager: Vector encryption ENABLED");
THEMIS_DEBUG("VectorIndexManager: Encrypted vector for pk={}", pk);
THEMIS_WARN("rebuildFromStorage: Failed to decrypt vector for pk={}: {}", pk, ex.what());

Audit Logging

All encryption operations are logged for compliance:

  • Vector encryption (when adding entities)
  • Vector decryption (when rebuilding from storage)
  • Encryption errors and failures
  • Configuration changes

Performance Impact

Storage Overhead

Plaintext 768-dim vector:  3,072 bytes
Encrypted 768-dim vector:  3,150 bytes (+2.5%)

Components:
- Ciphertext:     3,072 bytes (768 × 4)
- IV:                12 bytes
- Auth tag:          16 bytes
- Metadata:         ~50 bytes (key_id, version, base64)
Total:            3,150 bytes

Query Performance

  • No impact on search: Vectors are decrypted once during index load
  • HNSW search: Operates on plaintext vectors in memory (no decryption overhead)
  • Index load time: +40% overhead for decryption (5 seconds for 1M vectors)

Insertion Performance

  • Encryption overhead: ~0.4 ms per vector (768-dim)
  • Acceptable for production: Throughput remains high (>1000 vectors/sec)

Security

Encryption Algorithm

  • Algorithm: AES-256-GCM
  • Mode: Galois/Counter Mode (authenticated encryption)
  • IV Size: 12 bytes (96 bits, random per encryption)
  • Tag Size: 16 bytes (128 bits, authentication tag)

Key Management

  • Keys are managed by the configured KeyProvider
  • Supports key rotation with versioning
  • Old encrypted vectors can be decrypted with their original key version

Attack Surface

Before Encryption:

  • ❌ Disk: Plaintext vectors in RocksDB
  • ❌ Backups: Plaintext vectors in backup files

After Encryption:

  • ✅ Disk: AES-256-GCM encrypted vectors
  • ✅ Backups: Encrypted vectors
  • ⚠️ Memory: Plaintext vectors in HNSW index (required for search)

BSI C5 Compliance

CRY-03 (Data-at-Rest Encryption):

  • Fully Compliant after Phase 1
  • Vectors encrypted with AES-256-GCM
  • Keys managed by application (not OS-level)

Troubleshooting

Common Issues

1. Encryption Disabled After Restart

// Solution: Re-enable after server restart
vim.setVectorEncryptionEnabled(true);

2. Mixed Encrypted/Plaintext Vectors

This is expected during migration. The system handles both:

// rebuildFromStorage() tries in this order:
// 1. Encrypted vector (embedding_encrypted)
// 2. Lossless compressed vector
// 3. Plaintext vector (embedding)
// 4. SQ8 quantized vector (embedding_q)

3. Decryption Failures

Check logs for errors:

grep "Failed to decrypt" /var/log/themisdb/server.log

Possible causes:

  • Missing or incorrect encryption key
  • Corrupted encrypted data
  • Wrong key version

4. Performance Degradation

If index load time is too long:

  • Consider batch decryption optimization (Phase 2)
  • Use encrypted HNSW persistence (Phase 2, Ticket 3)

Best Practices

1. Enable Encryption Early

Enable encryption before adding large datasets:

vim.init("documents", 768);
vim.setVectorEncryptionEnabled(true);  // Enable before adding data
// Now add vectors...

2. Test Migration in Staging

Always test migration on a staging environment:

# 1. Copy production data to staging
# 2. Run dry-run migration
# 3. Run actual migration
# 4. Verify search works
# 5. Deploy to production

3. Monitor Disk Space

Encryption adds ~2.5% storage overhead. Monitor disk usage:

df -h /var/lib/themisdb

4. Regular Backups

Backup before enabling encryption:

# Backup before migration
tar czf themisdb-backup-$(date +%Y%m%d).tar.gz /var/lib/themisdb/data

# Restore if needed
tar xzf themisdb-backup-20251215.tar.gz -C /var/lib/themisdb/

5. Key Rotation

Plan for regular key rotation (quarterly):

// Create new key version
key_provider->createKey("vector_embeddings", 2);

// Migrate vectors to new key (future feature)

Future Enhancements

Phase 2 (Weeks 3-6)

Ticket 3: HNSW Index Encryption

  • Encrypt HNSW index files on disk
  • Reduce index load time with encrypted persistence
  • See docs/security/HNSW_PERSISTENCE_ENCRYPTION_ANALYSIS.md

Phase 3-4 (Long-term)

Ticket 5: Differential Privacy (3-6 months)

  • Add noise to vectors for privacy
  • Research-level feature

Ticket 6: Homomorphic Encryption (12 months)

  • Search on encrypted vectors without decryption
  • Highly experimental

References


Status: Production Ready
Version: 1.0 (Phase 1)
Date: December 15, 2025