Skip to content

[Bug]: MongoDB default memory (2Gi) insufficient for production-scale burst events #496

@ksaur

Description

@ksaur

Prerequisites

  • I searched existing issues
  • I can reproduce this issue

Bug Description

The default MongoDB memory limits (2Gi) may be insufficient for some production-scale clusters experiencing burst events. Scale testing on a 1500-node cluster demonstrated that MongoDB requires 4-6Gi memory per replica to handle burst scenarios above 1,000 events/sec without primary failover. #383

Discussion:
6GB is a number that worked well on the 1500 node cluster test. (I tested with both 4GB and 6GB. I erred on the side of caution and completed the scale tests with 6GB, but 4GB would have been acceptable in most of my testing.)

I understand that we do not want to waste resources. And in testing, it seems that MongoDB will consume most of the memory we give it. 6GB might be a bit too large for the average cluster, but results show that the current 1.5GB request/2GB limit may be too low.

I suggest we bump the current amounts up a bit, maybe to 3GB for requests and 4GB for limits. However, it really depends on the resources of the cluster and the workload. This is my attempt of suggestiong a rational default that will work for most clusters as we scale up.

Component

Other

Steps to Reproduce

During burst testing (#383) at 1,500-4,200 events/sec on a 1500-node cluster with default 2Gi memory:

  • MongoDB primary experiences memory pressure
  • Primary failover occurs under sustained burst load
  • Write operations are interrupted during failover
  • System instability during major cluster health events

With 4-6Gi memory, MongoDB remained stable through:

  • Moderate bursts: 1,500 events/sec - (2.3 GB used)
  • High bursts: 3,000 events/sec - (2.7 GB used)
  • Extreme bursts: 4,200 events/sec (1-3 min) - (3.6 GB used)

Environment

  • NVSentinel version: v0.4.0
  • Deployment method: helm

Logs/Output

Fixed by update distros/kubernetes/nvsentinel/charts/mongodb-store/values.yaml:

resources:
  requests:
    cpu: 1
    memory: 3Gi    # was 1.5Gi
  limits:
    cpu: 2
    memory: 4Gi    # was 2Gi.

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions