Skip to content

Scope context propagation socket capture to monitored processes#2542

Draft
mariomac wants to merge 3 commits into
open-telemetry:mainfrom
mariomac:fix-lb
Draft

Scope context propagation socket capture to monitored processes#2542
mariomac wants to merge 3 commits into
open-telemetry:mainfrom
mariomac:fix-lb

Conversation

@mariomac

@mariomac mariomac commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Summary

Fixes grafana/beyla#2691 ("sock_ops/sk_msg programs
conflicting with Cilium L7 LB generating stalled connections").

When context propagation is enabled, tpinjector claims TCP sockets into the
sock_dir sockhash and runs an sk_msg verdict (obi_packet_extender) over
them to inject traceparent. Both capture paths claimed every socket in the
cgroup, regardless of whether it belonged to a process OBI monitors:

  • obi_sockmap_tracker (sockops) on BPF_SOCK_OPS_ACTIVE_ESTABLISHED_CB called
    bpf_sock_hash_update unconditionally. The valid_pid() gate only runs later,
    in the sk_msg program — by then the socket is already in the sockhash.
  • obi_sk_iter_tcp (startup iter/tcp) added every existing socket too.

On hosts where another component manages the same sockets via sockmap — e.g.
Cilium's socket-LB / L7 proxy — both run an sk_msg/sk_skb verdict over the
same sockets and OBI additionally rewrites bytes with bpf_msg_push_data,
corrupting/stalling those connections. Reporter confirmed:
context_propagation=disabled → no stall; =headers → stall.

This is the sockmap-layer analogue of the existing TC-layer Cilium handling
(EnsureCiliumCompatibility), which had no counterpart here.

Fix

Only claim sockets owned by a monitored process. The owning PID isn't reliable
in the sockops callback (ACTIVE_ESTABLISHED runs in softirq for non-loopback
sockets), so we consult sock_pids — keyed by the connection tuple and
populated in process context, gated by valid_pid(), from kprobe/tcp_connect
(store_sock_pid), which runs before establishment.

  • tpinjector.c: bpf_sock_ops_active_est_cb now gates on
    sock_belongs_to_monitored_pid().
  • sock_iter.c: obi_sk_iter_tcp now gates on iter_sock_is_monitored();
    iterConstants() propagates filter_pids to the iter collection.
  • filter_pids == 0 (PID filter off) preserves the legacy "track everything"
    behavior.

Behavior change

Connections open before OBI starts aren't in sock_pids, so the startup
iterator no longer tracks them. This is intentional: such sockets can't be
attributed to a monitored process, and tracking them is exactly what triggers
the Cilium conflict.

Tests

New privileged tests (sockmap_coexistence_privileged_test.go) load the real
BPF objects, attach them, and drive loopback traffic:

  • TestSockmap_DoesNotClaimBystanderSockets — fails pre-fix, passes post-fix.
  • TestSockmapIter_DoesNotTrackBystanderSockets — iterator counterpart.
  • *_CapturesWhenPidFilterOff / *_TracksWhenPidFilterOff — positive controls
    ensuring the fix didn't disable capture entirely.

Run: make test-privileged. Verified on kernel 7.0. The monitored-IS-captured
path remains covered by the end-to-end traceparent integration suites.

Validation

@mariomac

Copy link
Copy Markdown
Contributor Author

Privileged tests failed. Submitting the fix and see if they now pass

@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 80.00000% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 68.52%. Comparing base (9a113d8) to head (6740c4f).
⚠️ Report is 12 commits behind head on main.

Files with missing lines Patch % Lines
pkg/internal/ebpf/tpinjector/tpinjector.go 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2542      +/-   ##
==========================================
- Coverage   69.29%   68.52%   -0.78%     
==========================================
  Files         348      349       +1     
  Lines       47314    47527     +213     
==========================================
- Hits        32786    32567     -219     
- Misses      12480    12905     +425     
- Partials     2048     2055       +7     
Flag Coverage Δ
integration-test 45.49% <50.00%> (-5.49%) ⬇️
integration-test-arm 26.16% <40.00%> (-0.93%) ⬇️
integration-test-vm-5.15-lts ?
integration-test-vm-6.18-lts ?
k8s-integration-test 35.87% <50.00%> (+0.13%) ⬆️
oats-test 35.07% <50.00%> (-0.69%) ⬇️
unittests 63.72% <100.00%> (+0.18%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mariomac mariomac changed the title reproduce grafana/beyla#2691 issue Scope context propagation socket capture to monitored processes Jun 30, 2026
@mariomac

Copy link
Copy Markdown
Contributor Author

This PR currently is not going to completely fix the issue, but reducing the blast radius as we are only taking care of sockets from monitored programs. However they might still be handled both by OBI and Cilium.

The complete fix is going to be to actually close the intersection, OBI needs to stop coexisting on the same sockets when a sockmap peer is present — the sockmap-layer analogue of the existing EnsureCiliumCompatibility (TC layer). Concretely:

  • Detect a Cilium sockmap/socket-LB peer (enumerate cgroup-attached sockops/sk_msg programs and match Cilium's cil_ name prefix — same technique hasCiliumTCX() uses for TCX), and/or detect socket-LB being active.
  • When detected, don't run black-box sockmap CP. Options, roughly in order of safety: disable black-box context propagation; or fall back to a non-sockmap injection path (TC-based, like the network path that already has Cilium handling); or restrict to TCP-options injection (no byte rewriting) if the disambiguation shows mechanism (1) isn't the culprit.
  • Surface it as a clear log/warning (mirroring the TC path) so it's observable.

That detection-and-disable is what makes it robust rather than probabilistic.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Beyla sock_ops/sk_msg programs conflicing with Cilium L7 LB generating stalled connections

1 participant