Skip to content

Commit c9f2f3b

Browse files
Update website/docs/bestpractices/analytics/spark-oom-kills.md
Co-authored-by: Manabu McCloskey <manabu.mccloskey@gmail.com>
1 parent 1811e20 commit c9f2f3b

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

website/docs/bestpractices/analytics/spark-oom-kills.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ import TabItem from '@theme/TabItem';
99
# Preventing OOM Kills in Spark on Kubernetes
1010
Every organization running large scale Spark workloads on Kubernetes has dealt with this: a job runs for hours, processes terabytes of data, completes 80% of its work, and then executors start disappearing. No JVM exception. No heap dump. No warning in Spark UI. Just `exit code 137` and hours of compute burned. The standard response is to throw more memory at it, bump `memoryOverhead` by another 10 GB, and hope for the best. That works until the next data spike.
1111

12-
The root cause is not insufficient memory. It is a design flaw in how **cgroupsv1** handles the Linux page cache. When a Spark executor reads shuffle data from local NVMe, the kernel caches those file pages in RAM. Under cgroupsv1, this page cache counts against the container's memory limit with no mechanism to reclaim it before the OOM killer fires. The kernel kills your executor to free memory it could have simply evicted.
12+
The root cause is not insufficient memory. It is a design limitation in how **cgroupsv1** handles the Linux page cache. When a Spark executor reads shuffle data from local storage, the kernel caches those file pages in RAM. Under cgroupsv1, this page cache counts against the container's memory limit with no mechanism to reclaim it before the OOM killer fires. The kernel kills your executor to free memory it could have simply evicted.
1313

1414
**cgroupsv2** fixes this with `memory.high`, a throttling boundary that forces page cache eviction before reaching the hard kill limit. Kubernetes exposes this through the **MemoryQoS** feature gate ([KEP-2570](https://github.com/kubernetes/enhancements/issues/2570)). This guide covers the kernel internals behind the problem, the cgroupsv2 solution, and the exact EKS configuration to deploy it.
1515

0 commit comments

Comments
 (0)