-
Notifications
You must be signed in to change notification settings - Fork 218
Description
Mountpoint for Amazon S3 version
mount-s3 1.19.0
AWS Region
us-east1
Describe the running environment
Access a TB-sized dataset in EKS using mountpoint-s3 using the csi driver (recent al2023 ami).
Mountpoint options
Mostly default settings
--metadata-ttl indefinite --read-part-size 5242880 --allow-other --region us-east-1
What happened?
For a workload where we mount a large (~2.4TB / 1200 * 2GB files) dataset that is accessed mostly in a random-access fashion, we see rampant memory usage after a short but heavy burst of access leading ultimately to OOM situations. We've seen the mount-s3 process use more than 15GB of memory and cannot find a way to limit its memory usage.
We are using mountpoint-s3 via the CSI driver.
Things we tried:
- decrease
--read-part-size
to its lower limit of 5242880 (no effect) - use an empty dir cache (seemingly no effect)
- updating to the new v2 csi-driver (I tried that after the old driver let k8s nodes just die because the systemd service just gobbled up all memory and the node went into unresponsiveness). With the v2 driver and setting
mountpointContainerResourcesLimitsMemory
I can see the mppods being OOMKilled (which is better for the cluster than hanging nodes, but doesn't help the application, restarting the mppod does not gracefully reestablish access for the application). Gradually increasingmountpointContainerResourcesLimitsMemory
, it becomes harder and harder to schedule the application at all.
Not tried:
- changing
UNSTABLE_MOUNTPOINT_MAX_PREFETCH_WINDOW_SIZE
because it's not supported by the CSI driver. Also, it's not clear how the default of 2GiB could still lead to >10GiB used.
In general, if there would be any limit we could use, it might just be good enough for us for the time being (even if it might make scheduling in a K8s cluster harder). In general, it would be better to limit memory usage cleanly and in the best case pick up the max memory from the cgroup environment.