You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/memory_management.md
+6-2Lines changed: 6 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -55,7 +55,11 @@ The selected thread transitions its state to `THREAD_SPLIT_THROW` and throws an
55
55
56
56
### Dedicated Threads vs. Pool Threads
57
57
58
-
From the view of the OOM state machine, each task has one or more "dedicated threads", along with zero or more "pool threads". When checking whether a task is blocked, OOM state machine is lenient on dedicated threads (only require any one of the dedicated threads to be blocked), but stringent on pool threads (all pool threads must be blocked). Being treated leniently is not always a good thing, it increases the chance of being mistakenly identified as a block task, thus causing unnecessary deadlock resolution. So we don't want a thread to be treated as a dedicated thread unless it is really necessary. There are two ways of avoiding a thread being treated as a dedicated thread:
58
+
From the view of the OOM state machine, each task has one or more "dedicated threads", along with zero or more "pool threads" (background threads). When checking whether a task is blocked, OOM state machine is lenient on dedicated threads (only require any one of the dedicated threads to be blocked), but stringent on pool threads (all pool threads must be blocked). Being treated leniently is not always a good thing, it increases the chance of being mistakenly identified as a blocked task, thus causing unnecessary deadlock resolution. So we don't want a thread to be treated as a dedicated thread unless it is really necessary. There are two ways of avoiding a thread being treated as a dedicated thread:
59
59
60
60
1. Avoid calling TaskContext.setTaskContext() in the current thread, this will prevent OOM state machine connecting the current thread to the task as a dedicated thread.
61
-
2. If you have to set TaskContext, then it's also a good idea to proactively register thread itself as a pool thread instead of a dedicated thread. An example can be found [here](https://github.com/NVIDIA/spark-rapids/blob/c39f6a6004b0cf684ca526172e87b2bd4481eb3a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOrcScan.scala#L2056) for registering threads.
61
+
2. Proactively register thread itself as a pool thread instead of a dedicated thread. An example can be found [here](https://github.com/NVIDIA/spark-rapids/blob/c39f6a6004b0cf684ca526172e87b2bd4481eb3a/sql-plugin/src/main/scala/com/nvidia/spark/rapids/GpuOrcScan.scala#L2056) for registering threads. (Don't forget to unregister the thread when it is done.)
62
+
63
+
In most cases, we recommend the second approach, because typically the main task and the background thread will form a producer-consumer relationship. The main task thread, which plays as a consumer, will typically wait for the background thread to produce data. If we choose approach 1 then while consumer is waiting, its Java thread state will be `WAITING`, and even if the producer is actively working (so the whole task should NOT be considered as blocked), the OOM state machine will mistakenly consider it as a blocked task because it cannot find any "pool thread" connected with this task. So "has at least one dedicated thread blocked on memory allocation, and all of the pool threads working on that task are also blocked" stands.
64
+
65
+
However, if we choose approach 2, then the OOM state machine will find at least one pool thread is NOT blocked, so "has at least one dedicated thread blocked on memory allocation, and all of the pool threads working on that task are also blocked" does NOT stand, then the task will NOT be considered as blocked.
0 commit comments