GenAI: support SSE streaming responses and raise HTTP capture limit to 256KB by NameHaibinZhang · Pull Request #2394 · open-telemetry/opentelemetry-ebpf-instrumentation

NameHaibinZhang · 2026-06-16T08:41:11Z

Summary

This PR adds SSE (Server-Sent Events) streaming response support for GenAI spans and raises the HTTP payload capture limit to 256KB (accumulated across multiple recv calls).

Changes

SSE streaming support:

Add parseOpenAIStream() to detect and parse OpenAI/Qwen SSE streaming responses, accumulating content deltas, tool calls, and usage statistics across multiple chunks.
Improve Anthropic JSON vs SSE classification using looksLikeJSON() instead of a naive first-byte check, handling responses with leading whitespace.
Add Qwen support for OpenAI-compatible endpoints (e.g. vLLM deployments) with SSE streaming.

Large response body support:

Raise MaxCapturedPayloadBytes from 64KB to 256KB for HTTP. Other protocols (MySQL, Kafka, Postgres, etc.) remain at 64KB.
The BPF layer per-syscall limit (k_large_buf_max_http_captured_bytes = 64KB) is unchanged — the 256KB connection-level limit is achieved by accumulating across multiple recv calls (up to 4 × 64KB chunks).

Truncated body resilience:

Add extractJSONRawField() to recover complete JSON fields (e.g. messages) from truncated request bodies, improving GenAI attribute extraction when payloads exceed capture limits.

Validation

I have read and followed the contributing guidelines
If this enhances / fixes / changes a core feature, I have updated the features documentation and support matrix as needed.

codecov · 2026-06-16T08:44:09Z

Codecov Report

❌ Patch coverage is 65.84158% with 69 lines in your changes missing coverage. Please review.
✅ Project coverage is 69.54%. Comparing base (92525d9) to head (a607881).
⚠️ Report is 28 commits behind head on main.

Files with missing lines	Patch %	Lines
pkg/ebpf/common/http/openai_stream.go	68.93%	28 Missing and 4 partials ⚠️
pkg/ebpf/common/http/qwen.go	48.64%	17 Missing and 2 partials ⚠️
pkg/ebpf/common/http/partial_json.go	65.71%	10 Missing and 2 partials ⚠️
pkg/ebpf/common/tcp_large_buffer.go	64.28%	5 Missing ⚠️
pkg/ebpf/common/http/openai.go	90.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2394      +/-   ##
==========================================
+ Coverage   69.35%   69.54%   +0.19%     
==========================================
  Files         321      326       +5     
  Lines       42164    42928     +764     
==========================================
+ Hits        29243    29856     +613     
- Misses      11243    11356     +113     
- Partials     1678     1716      +38

Flag	Coverage Δ
integration-test	`50.67% <4.45%> (-0.63%)`	⬇️
integration-test-arm	`29.75% <0.99%> (+1.24%)`	⬆️
integration-test-vm-5.15-lts	`30.33% <1.27%> (+1.21%)`	⬆️
integration-test-vm-6.18-lts	`28.02% <1.27%> (-2.50%)`	⬇️
k8s-integration-test	`37.88% <1.27%> (-0.72%)`	⬇️
oats-test	`36.16% <31.84%> (-0.71%)`	⬇️
unittests	`62.40% <82.16%> (+0.27%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mariomac · 2026-06-16T14:53:34Z

In large_buffer.h, there is a bunch of definitions like:

    k_large_buf_max_http_captured_bytes = 1 << 16,
    k_large_buf_max_mysql_captured_bytes = 1 << 16,
    k_large_buf_max_postgres_captured_bytes = 1 << 16,
    k_large_buf_max_kafka_captured_bytes = 1 << 16,
    k_large_buf_max_mssql_captured_bytes = 1 << 16,
    k_large_buf_max_tcp_captured_bytes = 1 << 16,

Do we need to update k_large_buf_max_http_captured_bytes at least to fit 256Kb?

mariomac

Great addition! Added few suggestions

rafaelroquetto

🤖 Review by Claude (Claude Code), posted by @rafaelroquetto on its behalf. This is an automated, ownership-focused review aligned with AGENTS.md and the repo's PR-review guidance — not Rafael's personal review (he has left separate comments inline).

The SSE parsing and the 256KB cross-syscall accumulation are sound at their core. My main concern is scope and ownership, per AGENTS.md ("Keep changes minimal and scoped to the task. Do not include unrelated edits"). Concrete code-level bugs are left as inline comments; the structural points are below.

1. Scope creep — the PR does much more than its title (please split)

Title: "support SSE streaming responses and raise HTTP capture limit to 256KB", but the diff also bundles features the description never mentions:

Rerank provider-format support — RerankRequest.NestedInput/NestedParams, GetQuery/GetDocuments/GetTopN, RerankResponse.NestedOutput, GetResults/GetID, VendorRerank.GetInput/GetOutput, gen_ai.rerank.top_n (span.go, tracesgen.go).
Retrieval top-k — RetrievalRequest.TopK/TopKSnake/Limit, GetTopK, gen_ai.retrieval.top_k.
Embedding dimensions — GetEmbeddingDimensions() + a duplicated emit block in tracesgen.go.
MCP tool-call argument/result capture — mcp.go, span.go, attrs.go, tracesgen.go.
Two SSL-capture band-aids — recoverJSONBodyFromBuffer and the TLS-ciphertext drop, both in the final merge llm commit.

None of rerank/retrieval/embedding/SSL is needed for SSE + 256KB. Each is reasonable on its own, but bundled they are hard to review and to revert independently. Please split into focused PRs, each explaining why the approach fits this codebase.

2. Duplication instead of reuse

The JSON-vs-SSE response branch is copy-pasted near-verbatim in openai.go and qwen.go (and Anthropic now also uses looksLikeJSON). AGENTS.md: "Extend or adapt existing code instead of duplicating functionality." Please extract one helper, e.g. parseOpenAICompatibleResponse(respB) (request.VendorOpenAI, []request.ToolCall). (Inline note on the openai.go site.)

3. Smaller items

large_buffers.h: the pre-existing "must equal the lte= values" comment is now false for HTTP (64KB const vs 256KB ceiling), and the appended paragraph frames cross-syscall accumulation as HTTP-specific — every protocol accumulates identically; only HTTP's configured ceiling differs. A one-liner would be clearer.
MaxCapturedPayloadBytes is now HTTP-only in meaning but keeps a generic name and is consumed solely as the decompression budget in responses.go; the 64KB→256KB bump silently 4×'s that budget. Consider a dedicated decompression constant.
extractJSONRawField finds the field via a plain bytes.Index("\"messages\""), which can match inside a string value or nested key rather than the documented top-level field.
Commit messages (fix test, fix attr, merge llm, fix stream) don't convey intent; merge llm bundles the two unrelated SSL changes.

Asks

Split the unrelated features (rerank/retrieval/embedding/SSL) into separate PRs.
Deduplicate the OpenAI-compatible response branch.
Fix the inline-flagged bugs (scanner cap, dead args, UnsafeView lifetime).
Confirm local validation (make verify, make test) on the trimmed change.

NameHaibinZhang · 2026-06-17T01:41:38Z

No — k_large_buf_max_http_captured_bytes is intentionally kept at 64 KB because it is the per-syscall emission cap, not the connection-level limit. The 256 KB budget is reached by accumulating across multiple tcp_recvmsg calls: each call emits up to 64 KB (4 × 16 KB ring-buffer events), and the running totals lb_req_bytes / lb_res_bytes track consumption until http_max_captured_bytes (256 KB) is exhausted. Keeping the per-call cap at 64 KB bounds the large_buf_emit_chunks() loop to 4 iterations, which is important for verifier complexity. @mariomac

…o 256KB - Add SSE (Server-Sent Events) stream parser for OpenAI-compatible APIs - Support streaming responses in OpenAI, Qwen, and Anthropic handlers - Extract shared parseOpenAICompatibleResponse() helper to reduce duplication - Raise HTTP payload capture limit to 256KB (accumulated across multiple recv calls) - Add looksLikeJSON() helper for content-type detection - Update configuration docs and schema Relates to open-telemetry#2267

NameHaibinZhang · 2026-06-17T02:43:47Z

PR Split Summary

Per @rafaelroquetto's review feedback on scope and ownership, this PR has been split into focused, independently reviewable changes:

#	PR	Scope
1	#2394 (this PR, reset)	Core: SSE streaming response parsing + 256KB HTTP capture limit
2	#2407	JSON body recovery from truncated TLS capture buffer
3	#2408	MCP tool-call arguments and results capture
4	#2409	Retrieval top_k extraction (multi-vendor)
5	#2410	Rerank nested request/response format support
6	#2411	Embedding dimensions derivation from response data

All PRs relate to #2267. Each is self-contained and can be reviewed/merged independently.

mariomac

Thanks for addressing! Happy with the changes related to this PR (will check the others later). Will wait for Rafael's approval before merging.

rafaelroquetto

This looks much better, many thanks for all of this. I have one comment left regarding MaxCapturedPayloadBytes, once we address that we can merge this

NameHaibinZhang · 2026-06-18T02:27:32Z

@rafaelroquetto done

mmat11

has this been tested manually? can we also add an integration test which exercise the "super large buffer" path?

mmat11 · 2026-06-18T11:58:37Z

-    // which enforces the same ceiling at configuration time.
+    k_large_buf_payload_max_size = 1 << 14,
+    k_large_buf_max_size = 1 << 15,
    k_large_buf_max_http_captured_bytes = 1 << 16,


this is gated to 64KB, which means

static __always_inline int http_send_large_buffer(http_info_t *req, const void *u_buf, u32 bytes_len, u8 packet_type, u8 direction, enum large_buf_action action) { if (http_max_captured_bytes > k_large_buf_max_http_captured_bytes) { bpf_dbg_printk("BUG: http_max_captured_bytes exceeds maximum allowed value."); }

will fail if the value is higher than that?

Good question — it won't cause a functional failure. Here's the layered design:

http_max_captured_bytes (set from userspace, up to 256KB) is the total per-request per-direction budget. It controls when to stop accumulating data (bytes_sent >= http_max_captured_bytes).

k_large_buf_max_http_captured_bytes (64KB) is the per-syscall-event cap, used in bpf_clamp_umax(max_available_bytes, k_large_buf_max_http_captured_bytes) to bound each ring buffer submission. This is critical for the BPF verifier to prove memory safety.

The 256KB total budget is reached by accumulating multiple 64KB chunks across successive tcp_recvmsg events.

The bpf_dbg_printk("BUG: ...") check is a debug-only assertion (gated by g_bpf_debug, compiled out in production) that predates the HTTP limit raise. It's now stale for HTTP — I can clean it up or update the condition to reflect the new semantic. It does not affect correctness since the code continues past it regardless and the actual safety bound is enforced by bpf_clamp_umax.

I thought there was a return statement. I think we should clean it up if it's pointless

Done — removed the stale assertion. It was comparing the per-request budget (256KB) against the per-syscall cap (64KB), which is now expected behavior in the multi-chunk design.

I don't think this is the right approach. IMHO the statement is correct: it is a bug to set a value that is lager than k_large_buf_max_http_captured_bytes because k_large_buf_max_http_captured_bytes is used to bound max_available_bytes for the verifier. So if max_available_bytes is larger than k_large_buf_max_http_captured_bytes, we end up truncating the buffer and throwing away the remainder of the payload silently.

So IMHO we should at least log that, or try to remove the clamp and see if the verifier is happy.

@rafaelroquetto You're absolutely right — this was a real bug. If a single tcp_recvmsg delivers more than 64KB, BPF truncates the chunk, and blindly appending subsequent truncated chunks would create holes in the reassembled buffer.

Fixed: the userspace reassembly now detects truncation by checking whether the accumulated data for a single emission hits exactly the per-syscall cap (64KB) with a full final chunk (16KB payload). When that pattern is detected, the buffer is "sealed" and subsequent chunks for that direction are discarded — ensuring the assembled data is always a contiguous prefix (tail truncation only, no holes).

Added unit tests covering: truncation detection, non-truncation pass-through, and seal reset on new request.

@NameHaibinZhang thanks for iterating this but I don't think this is the right approach and I'd rather we don't go down the userspace-heuristic path at all.

The userspace simply has no way of knowing whether a buffer was actually truncated. A syscall that got clamped at 64KB and a perfectly healthy one that just happens to land on a 64KB boundary produce exactly the same sequence of chunks, i.e same lengths, same actions, nothing to tell them apart. So len % 64KB == 0 isn't detecting truncation, it's guessing, and it'll guess wrong on legitimate traffic, silently capping perfectly healthy buffers at 64KB, which is the exact regression we're trying to avoid.

The truncation only exists as a fact inside BPF - that's the one place that knows bytes_len > cap so that's where it has to be dealt with.

I do think we can do this properly in eBPF with some effort. The verifier makes it fiddly, I know, and that's probably why we kept everything bounded at 64KB in the first place, but I'd much rather we take the time to respect the semantics and get it right than merge a workaround that quietly regresses the common case.

@rafaelroquetto You're right — the userspace heuristic can't reliably distinguish truncation from legitimate 64KB-aligned traffic. I'll revert the heuristic and implement this properly in BPF: add a truncated flag to the event metadata set when bytes_len > cap, so userspace gets an explicit signal to stop accumulation. Will push the BPF-side fix shortly.

@NameHaibinZhang: no, I don't mean it like that - a truncated flag is a workaround. We should be able to ship full 256KB from ebpf, the usual semantics. This will require some ebpf work. I'd suggest first trying to simply increasing the constant and see what the verifier says. In the likely event it complains, you will need to chunk the buffers. I'd rather us take some time to get it right than rush it and get it wrong, there's no urgency here.

@rafaelroquetto I've already tested raising k_large_buf_max_http_captured_bytes to 256KB directly — it fails the BPF verifier on 5.10 kernels. The loop unrolling in large_buf_emit_chunks goes from 4 iterations (4×16KB = 64KB) to 16 iterations (16×16KB = 256KB), which exceeds the instruction count limit on older kernels.

So the path forward would be a multi-emission approach (e.g. tail calls or multiple ring buffer submissions across separate BPF program invocations) to stay within verifier bounds while delivering the full 256KB. I'll work on that — it'll take a bit more time to get right.

NameHaibinZhang · 2026-06-18T12:48:50Z

Yes, this has been tested manually — I verified SSE streaming responses with payloads exceeding 64KB (up to ~200KB) and confirmed the large buffer path correctly accumulates chunks across multiple tcp_recvmsg events.

Regarding integration tests: agreed, this would be valuable. However, writing a proper integration test for the "super large buffer" path requires setting up an SSE/streaming server that produces >64KB responses and verifying the assembled payload in userspace — which is a non-trivial addition to the test infra. I'll track this as a follow-up. Would that be acceptable, or would you prefer it gated in this PR? @mmat11

mmat11 · 2026-06-18T12:55:43Z

Yes, this has been tested manually — I verified SSE streaming responses with payloads exceeding 64KB (up to ~200KB) and confirmed the large buffer path correctly accumulates chunks across multiple tcp_recvmsg events.

Regarding integration tests: agreed, this would be valuable. However, writing a proper integration test for the "super large buffer" path requires setting up an SSE/streaming server that produces >64KB responses and verifying the assembled payload in userspace — which is a non-trivial addition to the test infra. I'll track this as a follow-up. Would that be acceptable, or would you prefer it gated in this PR? @mmat11

follow-up is fine!

The check compared the per-request budget (now 256KB) against the per-syscall cap (64KB), which is expected in the new multi-chunk accumulation design. The assertion was debug-only and had no return statement, making it pointless.

rafaelroquetto

This looks better, thanks for the changes! There are a few failing tests that need to be addressed before approving, and I've reopened one of the discussions regarding k_large_buf_max_http_captured_bytes

Adopted reviewer's suggestion to use json.NewDecoder for extractJSONRawField. This fixes a bug where string values could be falsely matched as field names (e.g. {"label":"field","field":99} would match the value "field" instead of the key). Also removed the now-unnecessary extractJSONRawValue helper.

…erence When a single tcp_recvmsg delivers more data than the per-syscall BPF cap (64KB), the captured chunk is truncated. Previously, subsequent chunks would still be appended, creating holes in the reassembled buffer. Now we detect truncation (captured == per-syscall cap while the event is not terminal) and stop accumulating, ensuring the userspace buffer is always a contiguous prefix — tail truncation only, no holes.

NameHaibinZhang requested a review from a team as a code owner June 16, 2026 08:41

NameHaibinZhang force-pushed the feature/llm-stream branch from 175999a to bc09073 Compare June 16, 2026 09:30

mariomac reviewed Jun 16, 2026

View reviewed changes

Comment thread pkg/ebpf/common/http/openai_stream.go

Comment thread pkg/ebpf/common/http_transform.go Outdated

Comment thread pkg/ebpf/common/http_transform.go Outdated

Comment thread pkg/ebpf/common/http/openai_stream.go Outdated

rafaelroquetto reviewed Jun 16, 2026

View reviewed changes

Comment thread pkg/ebpf/common/http/openai_stream.go

Comment thread pkg/ebpf/common/http/openai_stream.go

Comment thread pkg/ebpf/common/http_transform.go Outdated

Comment thread pkg/ebpf/common/http/openai.go Outdated

NameHaibinZhang force-pushed the feature/llm-stream branch from 2a62e17 to 00b8fb1 Compare June 17, 2026 02:38

fix: correct Qwen URL detection to avoid false positives

508ffff

NameHaibinZhang mentioned this pull request Jun 17, 2026

Proposal AI Agent Observability Support #1854

Open

40 tasks

NameHaibinZhang requested review from mariomac and rafaelroquetto June 17, 2026 03:36

mariomac approved these changes Jun 17, 2026

View reviewed changes

rafaelroquetto requested changes Jun 17, 2026

View reviewed changes

Comment thread pkg/config/ebpf_tracer.go Outdated

rafaelroquetto requested changes Jun 17, 2026

View reviewed changes

NameHaibinZhang added 2 commits June 18, 2026 09:36

fix: move MaxCapturedPayloadBytes to responses.go as private constant

7920b0d

style: fix gofumpt formatting in responses.go

a344f95

mmat11 reviewed Jun 18, 2026

View reviewed changes

Comment thread pkg/ebpf/common/http/partial_json.go

rafaelroquetto reviewed Jun 18, 2026

View reviewed changes

NameHaibinZhang added 2 commits June 19, 2026 01:04

Conversation

NameHaibinZhang commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

Validation

Uh oh!

codecov Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

mariomac commented Jun 16, 2026

Uh oh!

mariomac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

rafaelroquetto left a comment

Choose a reason for hiding this comment

1. Scope creep — the PR does much more than its title (please split)

2. Duplication instead of reuse

3. Smaller items

Asks

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

NameHaibinZhang commented Jun 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

NameHaibinZhang commented Jun 17, 2026

PR Split Summary

Uh oh!

mariomac left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rafaelroquetto left a comment

Choose a reason for hiding this comment

Uh oh!

NameHaibinZhang commented Jun 18, 2026

Uh oh!

mmat11 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

mmat11 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

NameHaibinZhang Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

mmat11 Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

NameHaibinZhang Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Jun 18, 2026

Choose a reason for hiding this comment

Uh oh!

NameHaibinZhang Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rafaelroquetto Jun 18, 2026

Choose a reason for hiding this comment

NameHaibinZhang commented Jun 16, 2026 •

edited

Loading

codecov Bot commented Jun 16, 2026 •

edited

Loading

NameHaibinZhang commented Jun 17, 2026 •

edited

Loading

NameHaibinZhang Jun 18, 2026 •

edited

Loading

NameHaibinZhang Jun 18, 2026 •

edited

Loading

NameHaibinZhang Jun 19, 2026 •

edited

Loading

NameHaibinZhang commented Jun 18, 2026 •

edited

Loading