Bugfixkptrv2#12003
Open
RazeLighter777 wants to merge 2 commits intokernel-patches:bpf-next_basefrom
Open
Conversation
68b493c to
149211d
Compare
b7f1b70 to
5dfd5e2
Compare
A BPF program attached to tp_btf/nmi_handler can delete map entries or swap out referenced kptrs from NMI context. Today that runs the kptr destructor inline. Destructors such as bpf_cpumask_release() can take RCU-related locks, so running them from NMI can deadlock the system. Preallocate offload jobs from the global BPF memory allocator, track the number of live destructor-backed references so the pool stays ahead of NMI frees, and let the worker invoke the destructor after NMI exits. The algorithm for preallocation is simple: The invariant is total >= refs + active, where refs = the ref kptrs installed in maps, active = jobs being executed in the irq_work worker, and total is the number of job structures allocated. To avoid excessive pre-allocation calls while maintaining the invariant, we allocate the needed slots, plus a small amount of extra, min(needed, BPF_DTOR_KPTR_RESERVE_HEADROOM), where BPF_DTOR_KPTR_RESERVE_HEADROOM is 64 in this patch. A small but harmless ordering subtlety: the active atomic is read before refs. This can result in a small amount of over allocation, but this won't be leaked and will properly be carried into the trim stage. The trim stage is simple. It uses a CAS loop to free excessive leftover idle job slots. It snapshots total refs and active, pops an idle job if the pool is excessively large, and attempts a cmpxhg to decrement it atomically. On a failure case, it will just push the job back into the idle list and retry. There are several best-effort mitigation methods to tackle the memory pressure problem, preserving integrity under this unlikely scenario. If reserving another offload slot fails while installing a new destructor-backed kptr through bpf_kptr_xchg(), leave the destination unchanged and return the incoming pointer so the caller keeps ownership. This is superior to leaking the pointer, and should only happen if the accounting is incorrect. Moreover, this is a condition the caller can check for and recover from. If NMI teardown still fails to grab an idle offload job despite that reserve accounting, warn once and run the destructor inline rather than leak the object permanently. Attempt to repair the counter safely with another CAS loop in that case, preserving concurrent increments. This fix does come with small performance tradeoffs for safety. xchg can no longer be inlined for referenced kptrs, as inlining would break the reference counting. The inlining fix is preserved for kptrs with no destructor defined. This keeps refcounted kptr teardown out of NMI context without slowing down raw kptr exchanges that never need destructor handling. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Reported-by: Justin Suess <utilityemal77@gmail.com> Closes: https://lore.kernel.org/bpf/20260421201035.1729473-1-utilityemal77@gmail.com/ Signed-off-by: Justin Suess <utilityemal77@gmail.com>
Programs attached to tp_btf/nmi_handler can drop refcounted kptrs from NMI context by deleting map entries or clearing map values. Add a dedicated BPF-side selftest program that populates hash and array maps with cpumask kptrs and clears them again from the NMI handler. This test fails on the upstream and results in a lockdep warning, but passes when NMI dtors are properly offloaded by the previous commit. The test asserts that every object queued for destruction in hardirq from NMI had the dtor called on it. The irq_work which has the IRQ_WORK_HARD_IRQ flag is drained with kern_sync_rcu to ensure consistency. Cc: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Justin Suess <utilityemal77@gmail.com>
5dfd5e2 to
0387669
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.