-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Prerequisites
- I searched existing issues
- I can reproduce this issue
Bug Description
The default MongoDB memory limits (2Gi) may be insufficient for some production-scale clusters experiencing burst events. Scale testing on a 1500-node cluster demonstrated that MongoDB requires 4-6Gi memory per replica to handle burst scenarios above 1,000 events/sec without primary failover. #383
Discussion:
6GB is a number that worked well on the 1500 node cluster test. (I tested with both 4GB and 6GB. I erred on the side of caution and completed the scale tests with 6GB, but 4GB would have been acceptable in most of my testing.)
I understand that we do not want to waste resources. And in testing, it seems that MongoDB will consume most of the memory we give it. 6GB might be a bit too large for the average cluster, but results show that the current 1.5GB request/2GB limit may be too low.
I suggest we bump the current amounts up a bit, maybe to 3GB for requests and 4GB for limits. However, it really depends on the resources of the cluster and the workload. This is my attempt of suggestiong a rational default that will work for most clusters as we scale up.
Component
Other
Steps to Reproduce
During burst testing (#383) at 1,500-4,200 events/sec on a 1500-node cluster with default 2Gi memory:
- MongoDB primary experiences memory pressure
- Primary failover occurs under sustained burst load
- Write operations are interrupted during failover
- System instability during major cluster health events
With 4-6Gi memory, MongoDB remained stable through:
- Moderate bursts: 1,500 events/sec - (2.3 GB used)
- High bursts: 3,000 events/sec - (2.7 GB used)
- Extreme bursts: 4,200 events/sec (1-3 min) - (3.6 GB used)
Environment
- NVSentinel version: v0.4.0
- Deployment method: helm
Logs/Output
Fixed by update distros/kubernetes/nvsentinel/charts/mongodb-store/values.yaml:
resources:
requests:
cpu: 1
memory: 3Gi # was 1.5Gi
limits:
cpu: 2
memory: 4Gi # was 2Gi.