-
Notifications
You must be signed in to change notification settings - Fork 4.6k
Description
HashiCorp Vault GCS Backend - High Memory Consumption Investigation
Issue Summary
Problem: Vault active/leader node consumes ~14GB of RAM while standby nodes use only ~30MB. This occurs with a relatively small dataset (~290MB in GCS storage) and persists across Vault versions.
Environment:
- Vault Version: 1.21.1
- Storage Backend: Google Cloud Storage (GCS)
- Platform: Google Kubernetes Engine (GKE)
- HA Configuration: 3-node cluster (1 active, 2 standby)
- Seal: GCP Cloud KMS (gcpckms)
Vault Configuration
ui = true
disable_mlock = true
api_addr = "https://vault.example.com"
listener "tcp" {
tls_disable = 1
address = "[::]:8200"
cluster_address = "[::]:8201"
}
seal "gcpckms" {
project = "my-project"
region = "global"
key_ring = "vault-keys"
crypto_key = "vault-unseal"
}
service_registration "kubernetes" {}
storage "gcs" {
bucket = "vault-storage-bucket"
ha_enabled = "true"
chunk_size = "8192" # Also tested with "512"
}Observed Behavior
Memory Usage Pattern
| Node | Role | Memory Usage |
|---|---|---|
| vault-0 | active | ~14,000 Mi |
| vault-1 | standby | ~30 Mi |
| vault-2 | standby | ~30 Mi |
Memory Growth During Startup
After a fresh pod restart, memory grows rapidly:
10:44:30 → 3,658 Mi
10:44:44 → 6,833 Mi
10:45:15 → 8,943 Mi
10:46:43 → 13,991 Mi
10:47:26 → 14,020 Mi (stabilized)
The leader consumes ~14GB regardless of which pod becomes active.
Investigation Details
1. Storage Analysis
GCS Bucket Size:
$ gsutil du -sh gs://vault-storage-bucket
289.74 MiBConclusion: Storage is small (~290MB), cannot explain 14GB RAM usage.
2. KV Secrets Engine
Total Secrets: 3,573 secrets across all paths
$ ./vault-audit-maxversions.sh secret "" 5 20 audit
[INFO] Total secrets scanned: 3573Secrets Distribution:
secretpath1/: 1
secretpath2/: 20
secretpath3/: 38 (with ~3000+ nested secrets)
secretpath4/: 2
secretpath5/: 4
secretpath6/: 1
Max Versions Configuration: Cleaned up secrets with unlimited versions (max_versions=0)
Conclusion: 3,573 secrets should not require 14GB RAM. Expected ~50-100MB for metadata index.
3. Token Analysis
Initial State: 1,027 active tokens
- 100% service tokens (persistent, stored in memory)
- ~92% orphan tokens
- TTL: 15-29 days
- Primary source: GitHub auth for CircleCI
Actions Taken:
- Configured GitHub auth to use batch tokens:
vault write auth/github/config token_type=batch - Ran token tidy:
vault write -force auth/token/tidy - Revoked 518 GitHub service tokens
- Final state: 253 tokens remaining
Conclusion: Token cleanup did not reduce memory. After pod restart, memory returned to 14GB.
4. Lease Analysis
Auth Leases:
$ vault list sys/leases/lookup/auth/gcp/login/
0 leases # After cleanupGCP Secrets Engine Leases: All 9 GCP secrets engines have 0 active leases
gcp/: 0
gcp-p1/: 0
gcp-p2/: 0
gcp-p3: 0
gcp-p4/: 0
gcp-p5/: 0
gcp-p6/: 0
gcp-p7/: 0
gcp-p8/: 0
Conclusion: No leases contributing to memory usage.
5. Identity Store
$ vault list identity/entity/id 2>/dev/null | wc -l
0
$ vault list identity/group/id 2>/dev/null | wc -l
0Conclusion: Empty identity store, not a factor.
6. Policies
$ vault policy list | wc -l
39Conclusion: 39 policies is negligible.
7. Audit Log
$ kubectl exec -n vault vault-2 -- ls -lah /vault/logs/
-rw------- vault vault 54.0K vault_audit_2.logConclusion: Audit log is tiny (54KB), not a factor.
8. Secrets Engines Mounted
Path Type Description
cubbyhole/ cubbyhole per-token private secret storage
gcp-p1/ gcp
gcp-p2/ gcp
gcp-p3/ gcp
gcp-p4/ gcp
gcp-p5/ gcp
gcp-p6/ gcp
gcp-p7/ gcp
gcp-p8/ gcp
gcp/ gcp
identity/ identity identity store
secret/ kv KV v2
ssh-client-signer/ ssh
ssh-github/ ssh
sys/ system
9. Auth Methods
Path Type Description
capture-ssh-access/ gcp
gcp/ gcp
github/ github
token/ token
Remediation Attempts
Attempt 1: Token Cleanup
- Converted GitHub auth to batch tokens
- Revoked 518 service tokens
- Result: No memory reduction
Attempt 2: Lease Cleanup
- Revoked ~24,000 GCP auth leases (from previous session)
- Result: Temporary reduction, memory returned to 14GB after pod restart
Attempt 3: GCS chunk_size Change
- Changed from
chunk_size = "512"tochunk_size = "8192" - Performed full cluster restart (scale 0 → 3)
- Result: No change, memory still reached 14GB
Attempt 4: Max Versions Cleanup
- Identified secrets with unlimited versions (max_versions=0)
- Patched to max_versions=5 and destroyed old versions
- Result: GCS bucket size reduced slightly, no RAM impact
Summary Table
| Component | Value | Expected RAM Impact | Actual RAM Impact |
|---|---|---|---|
| GCS Storage | 290 MB | - | - |
| KV Secrets | 3,573 | ~50-100 MB | Unknown |
| Active Tokens | 253 | ~10-50 MB | Unknown |
| Auth Leases | 0 | 0 | 0 |
| GCP Engine Leases | 0 | 0 | 0 |
| Identity Entities | 0 | 0 | 0 |
| Identity Groups | 0 | 0 | 0 |
| Policies | 39 | Negligible | Negligible |
| Audit Log | 54 KB | Negligible | Negligible |
| TOTAL EXPECTED | < 500 MB | ~14 GB |
Logs Analysis
Startup logs show normal mount initialization:
core: Initializing version history cache for core
core: loaded wrapping token key
core: successfully mounted: type=kv version="v0.25.0+builtin" path=secret/
core: successfully mounted: type=gcp version="v0.23.0+builtin" path=gcp-*/
Cluster warnings (not related to memory):
core.cluster-listener: no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]
GitHub Issue Template
### Vault version
1.21.1
### Vault storage backend
GCS (Google Cloud Storage)
### Describe the bug
Active/leader Vault node consumes ~14GB RAM while standby nodes use only ~30MB.
This occurs with a small dataset (~290MB in GCS, 3573 KV secrets, 253 tokens, 0 leases).
### To Reproduce
1. Deploy Vault 1.21.1 with GCS backend on Kubernetes (3-node HA)
2. Store ~3500 KV secrets
3. Observe leader memory consumption
### Expected behavior
Memory usage proportional to data size. With 290MB storage and 3573 secrets,
expected RAM < 1GB, not 14GB.
### Environment
- Vault version: 1.21.1
- Storage backend: GCS
- Platform: GKE
- HA: 3 nodes
- Seal: gcpckms
### Additional context
- Issue persists across Vault versions (not version-specific)
- Memory grows rapidly at startup (3GB → 14GB in ~2 minutes)
- Cleanup of tokens, leases, and secrets has no impact
- chunk_size configuration change has no impactInvestigation Date: January 22, 2026
Investigator: Patrick Poulin
Status: Unresolved - Escalate to HashiCorp