forked from bpftrace/bpftrace
-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathoomkill.bt
More file actions
78 lines (71 loc) · 3.08 KB
/
oomkill.bt
File metadata and controls
78 lines (71 loc) · 3.08 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
#!/usr/bin/env bpftrace
// oomkill Trace OOM killer.
// For Linux, uses bpftrace and eBPF.
//
// This traces the kernel out-of-memory killer, and prints basic details,
// including the system load averages. This can provide more context on the
// system state at the time of OOM: was it getting busier or steady, based
// on the load averages? This tool may also be useful to customize for
// investigations; for example, by adding other task_struct details at the
// time of the OOM, or other commands in the system() call.
//
// This currently works by using kernel dynamic tracing of oom_kill_process().
//
// Example of usage:
//
// # ./oomkill.bt
// Tracing oom_kill_process()... Ctrl-C to end.
// 21:03:39 Triggered by PID 3297 ("ntpd"), OOM kill of PID 22516 ("perl"), 3850642 pages, loadavg: 0.99 0.39 0.30 3/282 22724
// 21:03:48 Triggered by PID 22517 ("perl"), OOM kill of PID 22517 ("perl"), 3850642 pages, loadavg: 0.99 0.41 0.30 2/282 22932
//
// If cgroup memory:
//
// 18:40:01 Triggered by PID 224654 ("oom_minimal"), CGROUP 45172 ("unified:/oom-test"), OOM kill of PID 224654 ("oom_minimal"), 2097663 pages, loadavg: 0.56 0.32 0.26 3/980 224655
//
// The first line shows that PID 22516, with process name "perl", was OOM killed
// when it reached 3850642 pages (usually 4 Kbytes per page). This OOM kill
// happened to be triggered by PID 3297, process name "ntpd", doing some memory
// allocation.
//
// The system log (dmesg) shows pages of details and system context about an OOM
// kill. What it currently lacks, however, is context on how the system had been
// changing over time. I've seen OOM kills where I wanted to know if the system
// was at steady state at the time, or if there had been a recent increase in
// workload that triggered the OOM event. oomkill provides some context: at the
// end of the line is the load average information from /proc/loadavg. For both
// of the oomkills here, we can see that the system was getting busier at the
// time (a higher 1 minute "average" of 0.99, compared to the 15 minute "average"
// of 0.30).
//
// oomkill can also be the basis of other tools and customizations. For example,
// you can edit it to include other task_struct details from the target PID at
// the time of the OOM kill, or to run other commands from the shell.
//
// There is another version of this tool in bcc: https://github.com/iovisor/bcc
//
// Copyright 2018 Netflix, Inc.
//
// 07-Sep-2018 Brendan Gregg Created this.
// 03-Sep-2025 Rong Tao Support memory cgroup.
#ifndef BPFTRACE_HAVE_BTF
#include <linux/oom.h>
#endif
BEGIN
{
printf("Tracing oom_kill_process()... Hit Ctrl-C to end.\n");
}
kprobe:oom_kill_process
{
$oc = (struct oom_control *)arg0;
$memcg = (struct mem_cgroup *)$oc.memcg;
time("%H:%M:%S ");
printf("Triggered by PID %d (\"%s\"), ", pid, comm);
if $memcg {
$cgrp = (struct cgroup *)$memcg.css.cgroup;
$memcg_id = $cgrp.kn.id;
printf("CGROUP %ld (\"%s\"), ", $memcg_id, cgroup_path($memcg_id));
}
printf("OOM kill of PID %d (\"%s\"), %d pages, loadavg: ",
$oc.chosen.pid, $oc.chosen.comm, $oc.totalpages);
cat("/proc/loadavg");
}