Skip to content

Runaway memory usage #1556

@jrudolph

Description

@jrudolph

Mountpoint for Amazon S3 version

mount-s3 1.19.0

AWS Region

us-east1

Describe the running environment

Access a TB-sized dataset in EKS using mountpoint-s3 using the csi driver (recent al2023 ami).

Mountpoint options

Mostly default settings


--metadata-ttl indefinite --read-part-size 5242880 --allow-other --region us-east-1

What happened?

For a workload where we mount a large (~2.4TB / 1200 * 2GB files) dataset that is accessed mostly in a random-access fashion, we see rampant memory usage after a short but heavy burst of access leading ultimately to OOM situations. We've seen the mount-s3 process use more than 15GB of memory and cannot find a way to limit its memory usage.

We are using mountpoint-s3 via the CSI driver.

Things we tried:

  • decrease --read-part-size to its lower limit of 5242880 (no effect)
  • use an empty dir cache (seemingly no effect)
  • updating to the new v2 csi-driver (I tried that after the old driver let k8s nodes just die because the systemd service just gobbled up all memory and the node went into unresponsiveness). With the v2 driver and setting mountpointContainerResourcesLimitsMemory I can see the mppods being OOMKilled (which is better for the cluster than hanging nodes, but doesn't help the application, restarting the mppod does not gracefully reestablish access for the application). Gradually increasing mountpointContainerResourcesLimitsMemory, it becomes harder and harder to schedule the application at all.

Not tried:

  • changing UNSTABLE_MOUNTPOINT_MAX_PREFETCH_WINDOW_SIZE because it's not supported by the CSI driver. Also, it's not clear how the default of 2GiB could still lead to >10GiB used.

In general, if there would be any limit we could use, it might just be good enough for us for the time being (even if it might make scheduling in a K8s cluster harder). In general, it would be better to limit memory usage cleanly and in the best case pick up the max memory from the cgroup environment.

Relevant log output

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions