Scope context propagation socket capture to monitored processes#2542
Scope context propagation socket capture to monitored processes#2542mariomac wants to merge 3 commits into
Conversation
|
Privileged tests failed. Submitting the fix and see if they now pass |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2542 +/- ##
==========================================
- Coverage 69.29% 68.52% -0.78%
==========================================
Files 348 349 +1
Lines 47314 47527 +213
==========================================
- Hits 32786 32567 -219
- Misses 12480 12905 +425
- Partials 2048 2055 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
This PR currently is not going to completely fix the issue, but reducing the blast radius as we are only taking care of sockets from monitored programs. However they might still be handled both by OBI and Cilium. The complete fix is going to be to actually close the intersection, OBI needs to stop coexisting on the same sockets when a sockmap peer is present — the sockmap-layer analogue of the existing EnsureCiliumCompatibility (TC layer). Concretely:
That detection-and-disable is what makes it robust rather than probabilistic. |
Summary
Fixes grafana/beyla#2691 ("sock_ops/sk_msg programs
conflicting with Cilium L7 LB generating stalled connections").
When context propagation is enabled,
tpinjectorclaims TCP sockets into thesock_dirsockhash and runs ansk_msgverdict (obi_packet_extender) overthem to inject
traceparent. Both capture paths claimed every socket in thecgroup, regardless of whether it belonged to a process OBI monitors:
obi_sockmap_tracker(sockops) onBPF_SOCK_OPS_ACTIVE_ESTABLISHED_CBcalledbpf_sock_hash_updateunconditionally. Thevalid_pid()gate only runs later,in the
sk_msgprogram — by then the socket is already in the sockhash.obi_sk_iter_tcp(startup iter/tcp) added every existing socket too.On hosts where another component manages the same sockets via sockmap — e.g.
Cilium's socket-LB / L7 proxy — both run an
sk_msg/sk_skbverdict over thesame sockets and OBI additionally rewrites bytes with
bpf_msg_push_data,corrupting/stalling those connections. Reporter confirmed:
context_propagation=disabled→ no stall;=headers→ stall.This is the sockmap-layer analogue of the existing TC-layer Cilium handling
(
EnsureCiliumCompatibility), which had no counterpart here.Fix
Only claim sockets owned by a monitored process. The owning PID isn't reliable
in the sockops callback (ACTIVE_ESTABLISHED runs in softirq for non-loopback
sockets), so we consult
sock_pids— keyed by the connection tuple andpopulated in process context, gated by
valid_pid(), fromkprobe/tcp_connect(
store_sock_pid), which runs before establishment.tpinjector.c:bpf_sock_ops_active_est_cbnow gates onsock_belongs_to_monitored_pid().sock_iter.c:obi_sk_iter_tcpnow gates oniter_sock_is_monitored();iterConstants()propagatesfilter_pidsto the iter collection.filter_pids == 0(PID filter off) preserves the legacy "track everything"behavior.
Behavior change
Connections open before OBI starts aren't in
sock_pids, so the startupiterator no longer tracks them. This is intentional: such sockets can't be
attributed to a monitored process, and tracking them is exactly what triggers
the Cilium conflict.
Tests
New privileged tests (
sockmap_coexistence_privileged_test.go) load the realBPF objects, attach them, and drive loopback traffic:
TestSockmap_DoesNotClaimBystanderSockets— fails pre-fix, passes post-fix.TestSockmapIter_DoesNotTrackBystanderSockets— iterator counterpart.*_CapturesWhenPidFilterOff/*_TracksWhenPidFilterOff— positive controlsensuring the fix didn't disable capture entirely.
Run:
make test-privileged. Verified on kernel 7.0. The monitored-IS-capturedpath remains covered by the end-to-end traceparent integration suites.
Validation