Skip to content

Commit f72b8c1

Browse files
committed
feat(tracing): add chunked tail-call traceparent scanner for large HTTP headers (fixes #1381)
When a Traceparent header is preceded by large headers (e.g. Authorization tokens, X-Custom-Data payloads), it can land beyond the initial 1 KB eBPF capture window and be silently missed. This change adds a tail-call based chunked scanner that walks the request buffer in 956-byte steps, stopping at the end-of-headers marker or when the traceparent is found. bpf/common/trace_util.h: - Add bpf_strstr_tp_eoh(): single bpf_loop pass that finds either the traceparent or the \r\n\r\n end-of-headers marker, whichever comes first. Uses the existing k_tp_pos_not_found sentinel from the preceding bugfix PR. bpf/generictracer/k_tracer_tailcall.h: - Add k_tail_parse_traceparent_http = 10. k_tail_continue_netfd_read moves from index 10 to 11. WARNING: changing jump_table indices is a hard ABI break; the BPF objects and both Go loaders (generictracer and gotracer) must be updated atomically — a full pod restart is required. bpf/generictracer/protocol_http.h: - Add obi_parse_traceparent_http: the new tail-call program. Iterates up to MAX_CHUNKS (29) times, calling bpf_strstr_tp_eoh on each 956-byte slice. Stops at EOH, traceparent found, or iteration limit. bpf/generictracer/protocol_handler.c: - Dispatch to obi_parse_traceparent_http from the still_reading path in obi_handle_buf_with_args when chunked scanning is configured. bpf/generictracer/k_tracer.c: - Support ITER_UBUF and single-segment ITER_IOVEC in return_recvmsg; set u_buf_is_user=1 so downstream code uses bpf_probe_read_user. bpf/generictracer/k_tracer_defs.h: - Rename handle_buf_with_connection to handle_buf_with_connection_ext; add handle_user_buf_with_connection wrapper that sets u_buf_is_user=1. bpf/generictracer/ssl_defs.h, java_tls.c: - Switch SSL/Java-TLS paths to handle_user_buf_with_connection. pkg/internal/ebpf/generictracer/generictracer.go: - Register obi_parse_traceparent_http in the jump table at index 10. - Write bpf_max_request_tp_parse_size_kb from MaxRequestTPParseSizeKB; set to 0 on legacy kernels that lack bpf_loop. pkg/internal/ebpf/gotracer/gotracer.go: - Update k_tail_continue_netfd_read registration from index 10 to 11. pkg/ebpf/common/http_transform.go: - Use event.OriginalTraceId (not event.Tp.TraceId) as the large-buffer map key in extractTCPLargeBuffer. The scanner may overwrite Tp.TraceId after the buffer is stored; OriginalTraceId is stable. pkg/ebpf/common/common.go: - Register a dummy stub for obi_parse_traceparent_http on legacy kernels that do not support bpf_loop (dummy.Copy() from the preceding bugfix PR). pkg/config/ebpf_tracer.go, pkg/obi/config.go: - Add MaxRequestTPParseSizeKB (default 4, range 4-27 KB). Controls the maximum request size scanned by the chunked parser. devdocs/config/CONFIG.md, config-schema.json: document the new knob. internal/test/integration/components/tpclient/service.js: - Add /with-huge-tp endpoint; refactor /with-tp span-ID increment to use the new nextTraceparent() helper. internal/test/integration/traceparent_extraction_test.go: - Add testWithHugeHeadersTraceparent: sends a 2500-byte filler header before Traceparent on the HTTP/kprobe path and asserts that all spans in the tpclient-a/b/c chain carry the static trace ID.
1 parent f151a9d commit f72b8c1

21 files changed

Lines changed: 773 additions & 56 deletions

File tree

bpf/common/trace_util.h

Lines changed: 67 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -98,6 +98,73 @@ static int tp_match(u32 index, void *data) {
9898
return 0;
9999
}
100100

101+
// Combined traceparent + end-of-headers search context.
102+
// Used by bpf_strstr_tp_eoh to find both in a single bpf_loop pass.
103+
struct callback_ctx_eoh {
104+
unsigned char *buf;
105+
u32 tp_pos;
106+
u32 eoh_pos;
107+
};
108+
109+
// Searches for traceparent and \r\n\r\n in a single pass.
110+
// Stops at whichever comes first:
111+
// - traceparent found → records tp_pos, stops
112+
// - \r\n\r\n found → records eoh_pos, stops (end of headers reached)
113+
//
114+
// The guard uses TRACE_PARENT_HEADER_LEN (68 bytes) as the cutoff for both
115+
// checks. is_eoh only needs 4 bytes, so the last 64 bytes of the buffer are
116+
// not checked for EOH here. Any EOH in that window is covered by the 68-byte
117+
// chunk overlap: the next chunk starts TRACE_PARENT_HEADER_LEN bytes before
118+
// the end of the current one, so the overlap bytes [956..1023] are rescanned
119+
// at local indices [0..67] in the next iteration.
120+
static int tp_eoh_match(u32 index, void *data) {
121+
if (index >= (TRACE_BUF_SIZE - TRACE_PARENT_HEADER_LEN)) {
122+
return 1;
123+
}
124+
125+
struct callback_ctx_eoh *ctx = data;
126+
unsigned char *s = &ctx->buf[index];
127+
128+
if (is_eoh(s)) {
129+
ctx->eoh_pos = index;
130+
return 1;
131+
}
132+
133+
if (is_traceparent(s)) {
134+
ctx->tp_pos = index;
135+
return 1;
136+
}
137+
138+
return 0;
139+
}
140+
141+
// Like bpf_strstr_tp_loop but also stops at the end-of-headers marker.
142+
// Sets *eoh_found=true if \r\n\r\n was reached before any traceparent.
143+
// Callers must not tail-call to the next chunk when *eoh_found is true.
144+
static __always_inline unsigned char *
145+
bpf_strstr_tp_eoh(unsigned char *buf, const u16 buf_len, bool *eoh_found) {
146+
*eoh_found = false;
147+
if (!g_bpf_traceparent_enabled) {
148+
return NULL;
149+
}
150+
151+
struct callback_ctx_eoh data = {
152+
.buf = buf, .tp_pos = k_tp_pos_not_found, .eoh_pos = k_tp_pos_not_found};
153+
154+
bpf_loop((u32)buf_len, tp_eoh_match, &data, 0);
155+
156+
if (data.eoh_pos != k_tp_pos_not_found) {
157+
*eoh_found = true;
158+
}
159+
160+
if (data.tp_pos != k_tp_pos_not_found) {
161+
return (data.tp_pos > (TRACE_BUF_SIZE - TRACE_PARENT_HEADER_LEN)) ? NULL
162+
: &buf[data.tp_pos];
163+
}
164+
165+
return NULL;
166+
}
167+
101168
static __always_inline unsigned char *bpf_strstr_tp_loop(unsigned char *buf, const u16 buf_len) {
102169
if (!g_bpf_traceparent_enabled) {
103170
return NULL;

bpf/generictracer/java_tls.c

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -160,7 +160,7 @@ int BPF_KPROBE(obi_kprobe_sys_ioctl) {
160160
void *buf = arg + 1 + sizeof(connection_info_t) + sizeof(u32);
161161
const u64 zero = 0;
162162
bpf_map_update_elem(&active_ssl_connections, &p_conn, &zero, BPF_ANY);
163-
handle_buf_with_connection(ctx, &p_conn, buf, len, WITH_SSL, op, orig_dport);
163+
handle_user_buf_with_connection(ctx, &p_conn, buf, len, WITH_SSL, op, orig_dport);
164164
}
165165

166166
return 0;

bpf/generictracer/k_tracer.c

Lines changed: 28 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -963,6 +963,8 @@ static __always_inline int return_recvmsg(void *ctx, struct sock *in_sock, u64 i
963963
}
964964

965965
unsigned char *buf = 0;
966+
u64 orig_ubuf = 0;
967+
int full_copied_len = copied_len;
966968
if (args) {
967969
iovec_iter_ctx *iov_ctx = (iovec_iter_ctx *)&args->iovec_ctx;
968970

@@ -972,6 +974,22 @@ static __always_inline int return_recvmsg(void *ctx, struct sock *in_sock, u64 i
972974
goto done;
973975
}
974976

977+
if (bpf_core_enum_value_exists(enum iter_type___dummy, ITER_UBUF) &&
978+
iov_ctx->iter_type == bpf_core_enum_value(enum iter_type___dummy, ITER_UBUF)) {
979+
orig_ubuf = (u64)iov_ctx->ubuf;
980+
} else if (iov_ctx->iter_type == bpf_core_enum_value(enum iter_type, ITER_IOVEC) &&
981+
iov_ctx->nr_segs == 1) {
982+
// On kernels < 6.0, ITER_UBUF does not exist and single-buffer recvmsg calls
983+
// use ITER_IOVEC with exactly one segment. In that case the entire receive
984+
// payload is in one contiguous userspace buffer; capture its base pointer so
985+
// the chunked traceparent scanner can read beyond the 8 KB iovec scratch cap
986+
// using bpf_probe_read_user (same mechanism as ITER_UBUF on newer kernels).
987+
struct iovec vec;
988+
if (bpf_probe_read_kernel(&vec, sizeof(vec), &iov_ctx->iov[0]) == 0 && vec.iov_base) {
989+
orig_ubuf = (u64)vec.iov_base;
990+
}
991+
}
992+
975993
buf = iovec_memory();
976994
if (buf) {
977995
copied_len = read_iovec_ctx(iov_ctx, buf, copied_len);
@@ -1025,8 +1043,16 @@ static __always_inline int return_recvmsg(void *ctx, struct sock *in_sock, u64 i
10251043
if (buf && copied_len) {
10261044
bpf_map_delete_elem(&active_recv_args, &id);
10271045
// doesn't return must be logically last statement
1028-
handle_buf_with_connection(
1029-
ctx, &info, buf, copied_len, NO_SSL, TCP_RECV, orig_dport);
1046+
handle_buf_with_connection_ext(ctx,
1047+
&info,
1048+
buf,
1049+
copied_len,
1050+
NO_SSL,
1051+
TCP_RECV,
1052+
orig_dport,
1053+
orig_ubuf,
1054+
full_copied_len,
1055+
0);
10301056
}
10311057
} else {
10321058
bpf_dbg_printk("identified SSL connection, ignoring: [%llx]...", *ssl);

bpf/generictracer/k_tracer_defs.h

Lines changed: 41 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -56,6 +56,11 @@ static __always_inline call_protocol_args_t *make_protocol_args(const pid_connec
5656
args->u_buf = (u64)u_buf;
5757
args->lw_thread = lw_thread;
5858
args->protocols = protocols;
59+
args->orig_buf = 0;
60+
args->full_bytes_len = 0;
61+
args->niter = 0;
62+
args->is_append = 0;
63+
args->u_buf_is_user = 0;
5964
args->protocol_type = protocol_type_for_conn_info(info);
6065

6166
args->pid_conn = *info;
@@ -64,13 +69,16 @@ static __always_inline call_protocol_args_t *make_protocol_args(const pid_connec
6469
return args;
6570
}
6671

67-
static __always_inline void handle_buf_with_connection(void *ctx,
68-
pid_connection_info_t *pid_conn,
69-
void *u_buf,
70-
int bytes_len,
71-
u8 ssl,
72-
u8 direction,
73-
u16 orig_dport) {
72+
static __always_inline void handle_buf_with_connection_ext(void *ctx,
73+
pid_connection_info_t *pid_conn,
74+
void *u_buf,
75+
int bytes_len,
76+
u8 ssl,
77+
u8 direction,
78+
u16 orig_dport,
79+
u64 orig_buf,
80+
u32 full_bytes_len,
81+
u8 u_buf_is_user) {
7482
call_protocol_args_t *args = make_protocol_args(pid_conn,
7583
k_lw_thread_none,
7684
k_protocol_selector_all,
@@ -83,9 +91,34 @@ static __always_inline void handle_buf_with_connection(void *ctx,
8391
return;
8492
}
8593

94+
args->orig_buf = orig_buf;
95+
args->full_bytes_len = full_bytes_len;
96+
args->u_buf_is_user = u_buf_is_user;
8697
bpf_tail_call(ctx, &jump_table, k_tail_handle_buf_with_args);
8798
}
8899

100+
static __always_inline void handle_buf_with_connection(void *ctx,
101+
pid_connection_info_t *pid_conn,
102+
void *u_buf,
103+
int bytes_len,
104+
u8 ssl,
105+
u8 direction,
106+
u16 orig_dport) {
107+
handle_buf_with_connection_ext(
108+
ctx, pid_conn, u_buf, bytes_len, ssl, direction, orig_dport, 0, 0, 0);
109+
}
110+
111+
static __always_inline void handle_user_buf_with_connection(void *ctx,
112+
pid_connection_info_t *pid_conn,
113+
void *u_buf,
114+
int bytes_len,
115+
u8 ssl,
116+
u8 direction,
117+
u16 orig_dport) {
118+
handle_buf_with_connection_ext(
119+
ctx, pid_conn, u_buf, bytes_len, ssl, direction, orig_dport, 0, 0, 1);
120+
}
121+
89122
static __always_inline void handle_light_weight_thread_buf(void *ctx,
90123
const lw_thread_t lw_thread,
91124
protocol_selector_t protocols,
@@ -101,6 +134,7 @@ static __always_inline void handle_light_weight_thread_buf(void *ctx,
101134
return;
102135
}
103136

137+
args->u_buf_is_user = 1;
104138
bpf_tail_call(ctx, &jump_table, k_tail_handle_buf_with_args);
105139
}
106140

bpf/generictracer/k_tracer_tailcall.h

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -24,5 +24,13 @@ enum {
2424
k_tail_protocol_http2_grpc_handle_end_frame = 7,
2525
k_tail_handle_buf_with_args = 8,
2626
k_tail_continue_protocol_http_tp = 9,
27-
k_tail_continue_netfd_read = 10,
27+
k_tail_parse_traceparent_http = 10,
28+
// WARNING: k_tail_continue_netfd_read moved from index 10 to 11 when the
29+
// chunked traceparent scanner was added at index 10. This affects gotracer
30+
// (go_net.c), which calls this index. Changing jump_table indices is a
31+
// hard breaking change during a rolling upgrade: the BPF objects and the
32+
// Go loader programs (both generictracer and gotracer) must be updated
33+
// atomically — a full pod restart is required; in-place BPF reload is not
34+
// safe.
35+
k_tail_continue_netfd_read = 11,
2836
};

bpf/generictracer/protocol_handler.c

Lines changed: 20 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -142,20 +142,33 @@ int obi_handle_buf_with_args(void *ctx) {
142142
packet_type = PACKET_TYPE_RESPONSE;
143143
}
144144

145-
http_send_large_buffer(info,
146-
(void *)args->u_buf,
147-
args->bytes_len,
148-
packet_type,
149-
args->direction,
150-
k_large_buf_action_append);
151-
152145
if (reading) {
146+
const u32 prev_len = info->len;
153147
info->len += args->bytes_len;
148+
if (g_bpf_traceparent_enabled && capture_header_buffer &&
149+
bpf_max_request_tp_parse_size_kb > 0 &&
150+
prev_len < (u32)bpf_max_request_tp_parse_size_kb * 1024) {
151+
args->packet_type = packet_type;
152+
args->is_append = 1;
153+
args->niter = 0;
154+
bpf_tail_call(ctx, &jump_table, k_tail_parse_traceparent_http);
155+
// tail-call failed — fall through
156+
}
154157
} else if (responding) {
155158
info->end_monotime_ns = bpf_ktime_get_ns();
156159
bpf_d_printk("bytes len %d, new bytes %d", info->resp_len, args->bytes_len);
157160
info->resp_len += args->bytes_len;
158161
}
162+
163+
// TP parsing not needed or tail-call failed: emit large buffer now.
164+
// When the tail-call succeeds, obi_parse_traceparent_http emits
165+
// it at done: instead, so this path is only reached as a fallback.
166+
http_send_large_buffer(info,
167+
(void *)args->u_buf,
168+
args->bytes_len,
169+
packet_type,
170+
args->direction,
171+
k_large_buf_action_append);
159172
}
160173
} else if (args->protocols.tcp && !info) {
161174
// SSL requests will see both TCP traffic and text traffic, ignore the TCP if

0 commit comments

Comments
 (0)