Skip to content

GCS Backend: Leader node consumes 14GB RAM with 290MB storage #31712

@acdn-ppoulin

Description

@acdn-ppoulin

HashiCorp Vault GCS Backend - High Memory Consumption Investigation

Issue Summary

Problem: Vault active/leader node consumes ~14GB of RAM while standby nodes use only ~30MB. This occurs with a relatively small dataset (~290MB in GCS storage) and persists across Vault versions.

Environment:

  • Vault Version: 1.21.1
  • Storage Backend: Google Cloud Storage (GCS)
  • Platform: Google Kubernetes Engine (GKE)
  • HA Configuration: 3-node cluster (1 active, 2 standby)
  • Seal: GCP Cloud KMS (gcpckms)

Vault Configuration

ui = true
disable_mlock = true
api_addr = "https://vault.example.com"

listener "tcp" {
  tls_disable = 1
  address = "[::]:8200"
  cluster_address = "[::]:8201"
}

seal "gcpckms" {
  project     = "my-project"
  region      = "global"
  key_ring    = "vault-keys"
  crypto_key  = "vault-unseal"
}

service_registration "kubernetes" {}

storage "gcs" {
  bucket      = "vault-storage-bucket"
  ha_enabled  = "true"
  chunk_size  = "8192"  # Also tested with "512"
}

Observed Behavior

Memory Usage Pattern

Node Role Memory Usage
vault-0 active ~14,000 Mi
vault-1 standby ~30 Mi
vault-2 standby ~30 Mi

Memory Growth During Startup

After a fresh pod restart, memory grows rapidly:

10:44:30 → 3,658 Mi
10:44:44 → 6,833 Mi
10:45:15 → 8,943 Mi
10:46:43 → 13,991 Mi
10:47:26 → 14,020 Mi (stabilized)

The leader consumes ~14GB regardless of which pod becomes active.

Investigation Details

1. Storage Analysis

GCS Bucket Size:

$ gsutil du -sh gs://vault-storage-bucket
289.74 MiB

Conclusion: Storage is small (~290MB), cannot explain 14GB RAM usage.

2. KV Secrets Engine

Total Secrets: 3,573 secrets across all paths

$ ./vault-audit-maxversions.sh secret "" 5 20 audit
[INFO] Total secrets scanned: 3573

Secrets Distribution:

secretpath1/: 1
secretpath2/: 20
secretpath3/: 38 (with ~3000+ nested secrets)
secretpath4/: 2
secretpath5/: 4
secretpath6/: 1

Max Versions Configuration: Cleaned up secrets with unlimited versions (max_versions=0)

Conclusion: 3,573 secrets should not require 14GB RAM. Expected ~50-100MB for metadata index.

3. Token Analysis

Initial State: 1,027 active tokens

  • 100% service tokens (persistent, stored in memory)
  • ~92% orphan tokens
  • TTL: 15-29 days
  • Primary source: GitHub auth for CircleCI

Actions Taken:

  1. Configured GitHub auth to use batch tokens: vault write auth/github/config token_type=batch
  2. Ran token tidy: vault write -force auth/token/tidy
  3. Revoked 518 GitHub service tokens
  4. Final state: 253 tokens remaining

Conclusion: Token cleanup did not reduce memory. After pod restart, memory returned to 14GB.

4. Lease Analysis

Auth Leases:

$ vault list sys/leases/lookup/auth/gcp/login/
0 leases  # After cleanup

GCP Secrets Engine Leases: All 9 GCP secrets engines have 0 active leases

gcp/: 0
gcp-p1/: 0
gcp-p2/: 0
gcp-p3: 0
gcp-p4/: 0
gcp-p5/: 0
gcp-p6/: 0
gcp-p7/: 0
gcp-p8/: 0

Conclusion: No leases contributing to memory usage.

5. Identity Store

$ vault list identity/entity/id 2>/dev/null | wc -l
0

$ vault list identity/group/id 2>/dev/null | wc -l
0

Conclusion: Empty identity store, not a factor.

6. Policies

$ vault policy list | wc -l
39

Conclusion: 39 policies is negligible.

7. Audit Log

$ kubectl exec -n vault vault-2 -- ls -lah /vault/logs/
-rw------- vault vault 54.0K vault_audit_2.log

Conclusion: Audit log is tiny (54KB), not a factor.

8. Secrets Engines Mounted

Path                     Type         Description
cubbyhole/               cubbyhole    per-token private secret storage
gcp-p1/        gcp          
gcp-p2/         gcp          
gcp-p3/                gcp          
gcp-p4/             gcp          
gcp-p5/             gcp          
gcp-p6/            gcp          
gcp-p7/          gcp          
gcp-p8/    gcp          
gcp/                     gcp          
identity/                identity     identity store
secret/                  kv           KV v2
ssh-client-signer/       ssh          
ssh-github/              ssh          
sys/                     system       

9. Auth Methods

Path                   Type      Description
capture-ssh-access/    gcp       
gcp/                   gcp       
github/                github    
token/                 token     

Remediation Attempts

Attempt 1: Token Cleanup

  • Converted GitHub auth to batch tokens
  • Revoked 518 service tokens
  • Result: No memory reduction

Attempt 2: Lease Cleanup

  • Revoked ~24,000 GCP auth leases (from previous session)
  • Result: Temporary reduction, memory returned to 14GB after pod restart

Attempt 3: GCS chunk_size Change

  • Changed from chunk_size = "512" to chunk_size = "8192"
  • Performed full cluster restart (scale 0 → 3)
  • Result: No change, memory still reached 14GB

Attempt 4: Max Versions Cleanup

  • Identified secrets with unlimited versions (max_versions=0)
  • Patched to max_versions=5 and destroyed old versions
  • Result: GCS bucket size reduced slightly, no RAM impact

Summary Table

Component Value Expected RAM Impact Actual RAM Impact
GCS Storage 290 MB - -
KV Secrets 3,573 ~50-100 MB Unknown
Active Tokens 253 ~10-50 MB Unknown
Auth Leases 0 0 0
GCP Engine Leases 0 0 0
Identity Entities 0 0 0
Identity Groups 0 0 0
Policies 39 Negligible Negligible
Audit Log 54 KB Negligible Negligible
TOTAL EXPECTED < 500 MB ~14 GB

Logs Analysis

Startup logs show normal mount initialization:

core: Initializing version history cache for core
core: loaded wrapping token key
core: successfully mounted: type=kv version="v0.25.0+builtin" path=secret/
core: successfully mounted: type=gcp version="v0.23.0+builtin" path=gcp-*/

Cluster warnings (not related to memory):

core.cluster-listener: no TLS config found for ALPN: ALPN=["req_fw_sb-act_v1"]

GitHub Issue Template

### Vault version
1.21.1

### Vault storage backend
GCS (Google Cloud Storage)

### Describe the bug
Active/leader Vault node consumes ~14GB RAM while standby nodes use only ~30MB. 
This occurs with a small dataset (~290MB in GCS, 3573 KV secrets, 253 tokens, 0 leases).

### To Reproduce
1. Deploy Vault 1.21.1 with GCS backend on Kubernetes (3-node HA)
2. Store ~3500 KV secrets
3. Observe leader memory consumption

### Expected behavior
Memory usage proportional to data size. With 290MB storage and 3573 secrets, 
expected RAM < 1GB, not 14GB.

### Environment
- Vault version: 1.21.1
- Storage backend: GCS
- Platform: GKE
- HA: 3 nodes
- Seal: gcpckms

### Additional context
- Issue persists across Vault versions (not version-specific)
- Memory grows rapidly at startup (3GB → 14GB in ~2 minutes)
- Cleanup of tokens, leases, and secrets has no impact
- chunk_size configuration change has no impact

Investigation Date: January 22, 2026
Investigator: Patrick Poulin
Status: Unresolved - Escalate to HashiCorp

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions