Skip to content

Commit e85115f

Browse files
committed
Merge branch 'bpf-add-support-for-sleepable-tracepoint-programs'
Mykyta Yatsenko says: ==================== bpf: Add support for sleepable tracepoint programs This series adds support for sleepable BPF programs attached to raw tracepoints (tp_btf, raw_tp) and classic tracepoints (tp). The motivation is to allow BPF programs on syscall tracepoints to use sleepable helpers such as bpf_copy_from_user(), enabling reliable user memory reads that can page-fault. This series removes restriction for faultable tracepoints: Patch 1 modifies __bpf_trace_run() to support sleepable programs. Patch 2 introduces bpf_prog_run_array_sleepable() to support new usecase. Patch 3 adds sleepable support for classic tracepoints (BPF_PROG_TYPE_TRACEPOINT) by introducing trace_call_bpf_faultable() and restructuring perf_syscall_enter/exit() to run BPF programs in faultable context. Patch 4 allows BPF_TRACE_RAW_TP, BPF_PROG_TYPE_RAW_TRACEPOINT, and BPF_PROG_TYPE_TRACEPOINT programs to be loaded as sleepable, with load-time and attach-time checks to reject sleepable programs on non-faultable tracepoints. Patch 5 adds libbpf SEC_DEF handlers: tp_btf.s, raw_tp.s, raw_tracepoint.s, tp.s, and tracepoint.s. Patch 6 adds selftests covering tp_btf.s, raw_tp.s, and tp.s positive cases using bpf_copy_from_user() plus negative tests for non-faultable tracepoints. Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com> --- Changes in v13: - Invert if (prog->sleepable) check in bpf_prog_run_array_sleepable() - Link to v12: https://patch.msgid.link/20260422-sleepable_tracepoints-v12-0-744bf0e3b311@meta.com Changes in v12: - Style improvement in the bpf_prog_run_array_sleepable(): use guard(rcu)(), remove unnecessary defensive programming artifacts. - Link to v11: https://patch.msgid.link/20260421-sleepable_tracepoints-v11-0-d8ff138d6f05@meta.com Changes in v11: - Avoid running dummy prog in the bpf_prog_run_array_sleepable() - Migrate selftests from nanosleep() to getcwd() to avoid issues with the different struct layouts. - Link to v10: https://patch.msgid.link/20260415-sleepable_tracepoints-v10-0-161f40b33dd7@meta.com Changes in v10: - Guard per-prog recursion check in bpf_prog_run_array_sleepable() with prog->active NULL check, following the same pattern as commit 7dc211c for prog->stats. dummy_bpf_prog has NULL active field and can appear in the array via bpf_prog_array_delete_safe() fallback on allocation failure. - Link to v9: https://patch.msgid.link/20260410-sleepable_tracepoints-v9-0-e719e664e84c@meta.com Changes in v9: - Fixed "classic raw tracepoints" to "raw tracepoints (tp_btf, raw_tp)" in commit message - Added bpf_prog_get_recursion_context() guard to __bpf_prog_test_run_raw_tp() to protect per-CPU private stack from concurrent sleepable test runs - Added new bpf_prog_run_array_sleepable() without is_uprobe parameter, remove all changes in bpf_prog_run_array_uprobe() - Refactored attach_tp() to use prefix array uniformly (matching attach_raw_tp() pattern), removing hardcoded strcmp() bare-name checks. - Recursion check in __bpf_prog_test_run_raw_tp() - Refactored selftests - Link to v8: https://patch.msgid.link/20260330-sleepable_tracepoints-v8-0-2e323467f3a0@meta.com Changes in v8: - Fix sleepable tracepoint support in bpf_prog_test_run() (Kumar, sashiko) - Link to v7: https://patch.msgid.link/20260325-sleepable_tracepoints-v6-0-2b182dacea13@meta.com Changes in v7: - Add recursion check (bpf_prog_get_recursion_context()) to make sure private stack is safe when sleepable program is preempted by itself (Alexei, Kumar) - Use combined rcu_read_lock_dont_migrate() instead of separate rcu_read_lock()/migrate_disable() calls for non-sleepable path (Alexei) - Link to v6: https://lore.kernel.org/bpf/20260324-sleepable_tracepoints-v6-0-81bab3a43f25@meta.com/ Changes in v6: - Remove recursion check from trace_call_bpf_faultable(), sleepable tracepoints are called from syscall enter/exit, no recursion is possible.(Kumar) - Refactor bpf_prog_run_array_uprobe() to support tracepoints usecase cleanly (Kumar) - Link to v5: https://lore.kernel.org/r/20260316-sleepable_tracepoints-v5-0-85525de71d25@meta.com Changes in v5: - Addressed AI review: zero initialize struct pt_regs in perf_call_bpf_enter(); changed handling tp.s and tracepoint.s in attach_tp() in libbpf. - Updated commit messages - Link to v4: https://lore.kernel.org/r/20260313-sleepable_tracepoints-v4-0-debc688a66b3@meta.com Changes in v4: - Follow uprobe_prog_run() pattern with explicit rcu_read_lock_trace() instead of relying on outer rcu_tasks_trace lock - Add sleepable support for classic raw tracepoints (raw_tp.s) - Add sleepable support for classic tracepoints (tp.s) with new trace_call_bpf_faultable() and restructured perf_syscall_enter/exit() - Add raw_tp.s, raw_tracepoint.s, tp.s, tracepoint.s SEC_DEF handlers - Replace growing type enumeration in error message with generic "program of this type cannot be sleepable" - Use PT_REGS_PARM1_SYSCALL (non-CO-RE) in BTF test - Add classic raw_tp and classic tracepoint sleepable tests - Link to v3: https://lore.kernel.org/r/20260311-sleepable_tracepoints-v3-0-3e9bbde5bd22@meta.com Changes in v3: - Moved faultable tracepoint check from attach time to load time in bpf_check_attach_target(), providing a clear verifier error message - Folded preempt_disable removal into the sleepable execution path patch - Used RUN_TESTS() with __failure/__msg for negative test case instead of explicit userspace program - Reduced series from 6 patches to 4 - Link to v2: https://lore.kernel.org/r/20260225-sleepable_tracepoints-v2-0-0330dafd650f@meta.com Changes in v2: - Address AI review points - modified the order of the patches - Link to v1: https://lore.kernel.org/bpf/20260218-sleepable_tracepoints-v1-0-ec2705497208@meta.com/ --- ==================== Link: https://patch.msgid.link/20260422-sleepable_tracepoints-v13-0-99005dff21ef@meta.com Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2 parents 9012cf2 + 8a20655 commit e85115f

14 files changed

Lines changed: 578 additions & 107 deletions

File tree

include/linux/bpf.h

Lines changed: 50 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr);
30793079
void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr);
30803080
void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip);
30813081

3082+
static __always_inline u32
3083+
bpf_prog_run_array_sleepable(const struct bpf_prog_array *array,
3084+
const void *ctx, bpf_prog_run_fn run_prog)
3085+
{
3086+
const struct bpf_prog_array_item *item;
3087+
struct bpf_prog *prog;
3088+
struct bpf_run_ctx *old_run_ctx;
3089+
struct bpf_trace_run_ctx run_ctx;
3090+
u32 ret = 1;
3091+
3092+
if (unlikely(!array))
3093+
return ret;
3094+
3095+
migrate_disable();
3096+
3097+
run_ctx.is_uprobe = false;
3098+
3099+
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
3100+
item = &array->items[0];
3101+
while ((prog = READ_ONCE(item->prog))) {
3102+
/* Skip dummy_bpf_prog placeholder (len == 0) */
3103+
if (unlikely(!prog->len)) {
3104+
item++;
3105+
continue;
3106+
}
3107+
3108+
if (unlikely(!bpf_prog_get_recursion_context(prog))) {
3109+
bpf_prog_inc_misses_counter(prog);
3110+
bpf_prog_put_recursion_context(prog);
3111+
item++;
3112+
continue;
3113+
}
3114+
3115+
run_ctx.bpf_cookie = item->bpf_cookie;
3116+
3117+
if (!prog->sleepable) {
3118+
guard(rcu)();
3119+
ret &= run_prog(prog, ctx);
3120+
} else {
3121+
ret &= run_prog(prog, ctx);
3122+
}
3123+
3124+
bpf_prog_put_recursion_context(prog);
3125+
item++;
3126+
}
3127+
bpf_reset_run_ctx(old_run_ctx);
3128+
migrate_enable();
3129+
return ret;
3130+
}
3131+
30823132
#else /* !CONFIG_BPF_SYSCALL */
30833133
static inline struct bpf_prog *bpf_prog_get(u32 ufd)
30843134
{

include/linux/trace_events.h

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -770,6 +770,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file)
770770

771771
#ifdef CONFIG_BPF_EVENTS
772772
unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx);
773+
unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx);
773774
int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie);
774775
void perf_event_detach_bpf_prog(struct perf_event *event);
775776
int perf_event_query_prog_array(struct perf_event *event, void __user *info);
@@ -792,6 +793,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c
792793
return 1;
793794
}
794795

796+
static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
797+
{
798+
return 1;
799+
}
800+
795801
static inline int
796802
perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie)
797803
{

include/trace/bpf_probe.h

Lines changed: 0 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -58,9 +58,7 @@ static notrace void \
5858
__bpf_trace_##call(void *__data, proto) \
5959
{ \
6060
might_fault(); \
61-
preempt_disable_notrace(); \
6261
CONCATENATE(bpf_trace_run, COUNT_ARGS(args))(__data, CAST_TO_U64(args)); \
63-
preempt_enable_notrace(); \
6462
}
6563

6664
#undef DECLARE_EVENT_SYSCALL_CLASS

kernel/bpf/syscall.c

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4281,6 +4281,11 @@ static int bpf_raw_tp_link_attach(struct bpf_prog *prog,
42814281
if (!btp)
42824282
return -ENOENT;
42834283

4284+
if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
4285+
bpf_put_raw_tracepoint(btp);
4286+
return -EINVAL;
4287+
}
4288+
42844289
link = kzalloc_obj(*link, GFP_USER);
42854290
if (!link) {
42864291
err = -ENOMEM;

kernel/bpf/verifier.c

Lines changed: 11 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -19267,6 +19267,12 @@ int bpf_check_attach_target(struct bpf_verifier_log *log,
1926719267
btp = bpf_get_raw_tracepoint(tname);
1926819268
if (!btp)
1926919269
return -EINVAL;
19270+
if (prog->sleepable && !tracepoint_is_faultable(btp->tp)) {
19271+
bpf_log(log, "Sleepable program cannot attach to non-faultable tracepoint %s\n",
19272+
tname);
19273+
bpf_put_raw_tracepoint(btp);
19274+
return -EINVAL;
19275+
}
1927019276
fname = kallsyms_lookup((unsigned long)btp->bpf_func, NULL, NULL, NULL,
1927119277
trace_symbol);
1927219278
bpf_put_raw_tracepoint(btp);
@@ -19483,14 +19489,17 @@ static bool can_be_sleepable(struct bpf_prog *prog)
1948319489
case BPF_MODIFY_RETURN:
1948419490
case BPF_TRACE_ITER:
1948519491
case BPF_TRACE_FSESSION:
19492+
case BPF_TRACE_RAW_TP:
1948619493
return true;
1948719494
default:
1948819495
return false;
1948919496
}
1949019497
}
1949119498
return prog->type == BPF_PROG_TYPE_LSM ||
1949219499
prog->type == BPF_PROG_TYPE_KPROBE /* only for uprobes */ ||
19493-
prog->type == BPF_PROG_TYPE_STRUCT_OPS;
19500+
prog->type == BPF_PROG_TYPE_STRUCT_OPS ||
19501+
prog->type == BPF_PROG_TYPE_RAW_TRACEPOINT ||
19502+
prog->type == BPF_PROG_TYPE_TRACEPOINT;
1949419503
}
1949519504

1949619505
static int check_attach_btf_id(struct bpf_verifier_env *env)
@@ -19512,7 +19521,7 @@ static int check_attach_btf_id(struct bpf_verifier_env *env)
1951219521
}
1951319522

1951419523
if (prog->sleepable && !can_be_sleepable(prog)) {
19515-
verbose(env, "Only fentry/fexit/fsession/fmod_ret, lsm, iter, uprobe, and struct_ops programs can be sleepable\n");
19524+
verbose(env, "Program of this type cannot be sleepable\n");
1951619525
return -EINVAL;
1951719526
}
1951819527

kernel/events/core.c

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11643,6 +11643,15 @@ static int __perf_event_set_bpf_prog(struct perf_event *event,
1164311643
/* only uprobe programs are allowed to be sleepable */
1164411644
return -EINVAL;
1164511645

11646+
if (prog->type == BPF_PROG_TYPE_TRACEPOINT && prog->sleepable) {
11647+
/*
11648+
* Sleepable tracepoint programs can only attach to faultable
11649+
* tracepoints. Currently only syscall tracepoints are faultable.
11650+
*/
11651+
if (!is_syscall_tp)
11652+
return -EINVAL;
11653+
}
11654+
1164611655
/* Kprobe override only works for kprobes, not uprobes. */
1164711656
if (prog->kprobe_override && !is_kprobe)
1164811657
return -EINVAL;

kernel/trace/bpf_trace.c

Lines changed: 45 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,34 @@ unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx)
152152
return ret;
153153
}
154154

155+
/**
156+
* trace_call_bpf_faultable - invoke BPF program in faultable context
157+
* @call: tracepoint event
158+
* @ctx: opaque context pointer
159+
*
160+
* Variant of trace_call_bpf() for faultable tracepoints (syscall
161+
* tracepoints). Supports sleepable BPF programs by using rcu_tasks_trace
162+
* for lifetime protection and bpf_prog_run_array_sleepable() for per-program
163+
* RCU flavor selection, following the uprobe pattern.
164+
*
165+
* Per-program recursion protection is provided by
166+
* bpf_prog_run_array_sleepable(). Global bpf_prog_active is not
167+
* needed because syscall tracepoints cannot self-recurse.
168+
*
169+
* Must be called from a faultable/preemptible context.
170+
*/
171+
unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx)
172+
{
173+
struct bpf_prog_array *prog_array;
174+
175+
might_fault();
176+
guard(rcu_tasks_trace)();
177+
178+
prog_array = rcu_dereference_check(call->prog_array,
179+
rcu_read_lock_trace_held());
180+
return bpf_prog_run_array_sleepable(prog_array, ctx, bpf_prog_run);
181+
}
182+
155183
#ifdef CONFIG_BPF_KPROBE_OVERRIDE
156184
BPF_CALL_2(bpf_override_return, struct pt_regs *, regs, unsigned long, rc)
157185
{
@@ -2072,11 +2100,19 @@ void bpf_put_raw_tracepoint(struct bpf_raw_event_map *btp)
20722100
static __always_inline
20732101
void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
20742102
{
2103+
struct srcu_ctr __percpu *scp = NULL;
20752104
struct bpf_prog *prog = link->link.prog;
2105+
bool sleepable = prog->sleepable;
20762106
struct bpf_run_ctx *old_run_ctx;
20772107
struct bpf_trace_run_ctx run_ctx;
20782108

2079-
rcu_read_lock_dont_migrate();
2109+
if (sleepable) {
2110+
scp = rcu_read_lock_tasks_trace();
2111+
migrate_disable();
2112+
} else {
2113+
rcu_read_lock_dont_migrate();
2114+
}
2115+
20802116
if (unlikely(!bpf_prog_get_recursion_context(prog))) {
20812117
bpf_prog_inc_misses_counter(prog);
20822118
goto out;
@@ -2085,12 +2121,18 @@ void __bpf_trace_run(struct bpf_raw_tp_link *link, u64 *args)
20852121
run_ctx.bpf_cookie = link->cookie;
20862122
old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx);
20872123

2088-
(void) bpf_prog_run(prog, args);
2124+
(void)bpf_prog_run(prog, args);
20892125

20902126
bpf_reset_run_ctx(old_run_ctx);
20912127
out:
20922128
bpf_prog_put_recursion_context(prog);
2093-
rcu_read_unlock_migrate();
2129+
2130+
if (sleepable) {
2131+
migrate_enable();
2132+
rcu_read_unlock_tasks_trace(scp);
2133+
} else {
2134+
rcu_read_unlock_migrate();
2135+
}
20942136
}
20952137

20962138
#define UNPACK(...) __VA_ARGS__

0 commit comments

Comments
 (0)