Commit 084506c
authored
[rust/rqd] Fix memory bug on OOM killed frames (#2093)
When a frame is killed due to OOM, it is possible the that thread that
collects stats before reporting back races the frame wrapping process
and gather stats for the frame when some of its procs have died, leading
to a incorrect reading of memory for the given frame.
# How the bug manifests:
For successful frames:**
1. Frame runs → processes accumulate memory → `refresh_procs` updates
stats normally
2. Frame completes naturally → all processes exit cleanly together
3. Final stats are captured before the cache is cleared
4. **Memory reported correctly**
**For killed frames (OOM):**
1. Frame detected using too much memory (e.g., 12GB actual usage)
2. `kill_session()` is called → child processes start dying
3. **Next `refresh_procs()` cycle happens** (this runs every report
interval)
4. `session_processes.clear()` **wipes out all the cached process data**
including the high memory readings
5. When rebuilding cache, zombie/dying processes are skipped
6. **Only the session leader remains** (in zombie state or about to
become one)
7. `collect_proc_stats()` now reads **only the session leader's memory**
(typically very small, just the shell wrapper)
8. **Massively underreported memory** (e.g., reports 1GB instead of
12GB)1 parent 8597e82 commit 084506c
2 files changed
+27
-0
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
11 | 11 | | |
12 | 12 | | |
13 | 13 | | |
| 14 | + | |
14 | 15 | | |
15 | 16 | | |
16 | 17 | | |
| |||
59 | 60 | | |
60 | 61 | | |
61 | 62 | | |
| 63 | + | |
| 64 | + | |
| 65 | + | |
62 | 66 | | |
63 | 67 | | |
64 | 68 | | |
| |||
177 | 181 | | |
178 | 182 | | |
179 | 183 | | |
| 184 | + | |
180 | 185 | | |
181 | 186 | | |
182 | 187 | | |
| |||
226 | 231 | | |
227 | 232 | | |
228 | 233 | | |
| 234 | + | |
| 235 | + | |
| 236 | + | |
| 237 | + | |
| 238 | + | |
229 | 239 | | |
230 | 240 | | |
231 | 241 | | |
| |||
1041 | 1051 | | |
1042 | 1052 | | |
1043 | 1053 | | |
| 1054 | + | |
| 1055 | + | |
1044 | 1056 | | |
1045 | 1057 | | |
1046 | 1058 | | |
| |||
1258 | 1270 | | |
1259 | 1271 | | |
1260 | 1272 | | |
| 1273 | + | |
| 1274 | + | |
| 1275 | + | |
| 1276 | + | |
| 1277 | + | |
| 1278 | + | |
| 1279 | + | |
| 1280 | + | |
| 1281 | + | |
| 1282 | + | |
| 1283 | + | |
1261 | 1284 | | |
1262 | 1285 | | |
1263 | 1286 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
408 | 408 | | |
409 | 409 | | |
410 | 410 | | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
411 | 415 | | |
412 | 416 | | |
413 | 417 | | |
| |||
0 commit comments