-
Notifications
You must be signed in to change notification settings - Fork 132
GenAI: support SSE streaming responses and raise HTTP capture limit to 256KB #2394
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
00b8fb1
508ffff
7920b0d
a344f95
33aa325
0768093
a607881
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -13,17 +13,8 @@ volatile const u32 mssql_max_captured_bytes = 0; | |
| volatile const u32 tcp_max_captured_bytes = 0; | ||
|
|
||
| enum { | ||
| // Maximum payload size per ring buffer chunk. | ||
| k_large_buf_payload_max_size = 1 << 14, // 16K | ||
|
|
||
| // Scratch memory size for a large buffer event: sizeof(tcp_large_buffer_t) + payload. | ||
| // Rounded up to the next power of 2 above k_large_buf_payload_max_size to account | ||
| // for the struct overhead. | ||
| k_large_buf_max_size = 1 << 15, // 32K | ||
|
|
||
| // Maximum valid value for each protocol's *_max_captured_bytes volatile variable. | ||
| // These must equal the lte= validation values in EBPFBufferSizes (pkg/config/ebpf_tracer.go), | ||
| // which enforces the same ceiling at configuration time. | ||
| k_large_buf_payload_max_size = 1 << 14, | ||
| k_large_buf_max_size = 1 << 15, | ||
| k_large_buf_max_http_captured_bytes = 1 << 16, | ||
|
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. this is gated to 64KB, which means will fail if the value is higher than that?
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Good question — it won't cause a functional failure. Here's the layered design:
The
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I thought there was a return statement. I think we should clean it up if it's pointless
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done — removed the stale assertion. It was comparing the per-request budget (256KB) against the per-syscall cap (64KB), which is now expected behavior in the multi-chunk design.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think this is the right approach. IMHO the statement is correct: it is a bug to set a value that is lager than So IMHO we should at least log that, or try to remove the clamp and see if the verifier is happy.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelroquetto You're absolutely right — this was a real bug. If a single Fixed: the userspace reassembly now detects truncation by checking whether the accumulated data for a single emission hits exactly the per-syscall cap (64KB) with a full final chunk (16KB payload). When that pattern is detected, the buffer is "sealed" and subsequent chunks for that direction are discarded — ensuring the assembled data is always a contiguous prefix (tail truncation only, no holes). Added unit tests covering: truncation detection, non-truncation pass-through, and seal reset on new request.
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @NameHaibinZhang thanks for iterating this but I don't think this is the right approach and I'd rather we don't go down the userspace-heuristic path at all. The userspace simply has no way of knowing whether a buffer was actually truncated. A syscall that got clamped at 64KB and a perfectly healthy one that just happens to land on a 64KB boundary produce exactly the same sequence of chunks, i.e same lengths, same actions, nothing to tell them apart. So len % 64KB == 0 isn't detecting truncation, it's guessing, and it'll guess wrong on legitimate traffic, silently capping perfectly healthy buffers at 64KB, which is the exact regression we're trying to avoid. The truncation only exists as a fact inside BPF - that's the one place that knows I do think we can do this properly in eBPF with some effort. The verifier makes it fiddly, I know, and that's probably why we kept everything bounded at 64KB in the first place, but I'd much rather we take the time to respect the semantics and get it right than merge a workaround that quietly regresses the common case.
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelroquetto You're right — the userspace heuristic can't reliably distinguish truncation from legitimate 64KB-aligned traffic. I'll revert the heuristic and implement this properly in BPF: add a
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @NameHaibinZhang: no, I don't mean it like that - a
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @rafaelroquetto I've already tested raising So the path forward would be a multi-emission approach (e.g. tail calls or multiple ring buffer submissions across separate BPF program invocations) to stay within verifier bounds while delivering the full 256KB. I'll work on that — it'll take a bit more time to get right. |
||
| k_large_buf_max_mysql_captured_bytes = 1 << 16, | ||
| k_large_buf_max_postgres_captured_bytes = 1 << 16, | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,193 @@ | ||
| // Copyright The OpenTelemetry Authors | ||
| // SPDX-License-Identifier: Apache-2.0 | ||
|
|
||
| package ebpfcommon // import "go.opentelemetry.io/obi/pkg/ebpf/common/http" | ||
|
|
||
| import ( | ||
| "bufio" | ||
| "encoding/json" | ||
| "io" | ||
| "log/slog" | ||
| "strings" | ||
|
|
||
| "go.opentelemetry.io/obi/pkg/appolly/app/request" | ||
| ) | ||
|
|
||
| // maxStreamToolCalls caps the tool-call accumulator to prevent unbounded | ||
| // growth from untrusted tool_calls[].index values. | ||
| const maxStreamToolCalls = 256 | ||
|
|
||
| type openAIStreamChunk struct { | ||
| ID string `json:"id"` | ||
| Object string `json:"object"` | ||
| Model string `json:"model"` | ||
| Choices []struct { | ||
| Index int `json:"index"` | ||
| Delta struct { | ||
| Role string `json:"role"` | ||
| Content string `json:"content"` | ||
| ToolCalls []openAIStreamToolCall `json:"tool_calls"` | ||
| } `json:"delta"` | ||
| FinishReason *string `json:"finish_reason"` | ||
| } `json:"choices"` | ||
| Usage *struct { | ||
| PromptTokens int `json:"prompt_tokens"` | ||
| CompletionTokens int `json:"completion_tokens"` | ||
| TotalTokens int `json:"total_tokens"` | ||
| InputTokens int `json:"input_tokens"` | ||
| OutputTokens int `json:"output_tokens"` | ||
| } `json:"usage"` | ||
| } | ||
|
|
||
| type openAIStreamToolCall struct { | ||
| Index int `json:"index"` | ||
| ID string `json:"id"` | ||
| Type string `json:"type"` | ||
| Function struct { | ||
| Name string `json:"name"` | ||
| Arguments string `json:"arguments"` | ||
| } `json:"function"` | ||
| } | ||
|
|
||
| // parseOpenAIStream parses the SSE stream from OpenAI-compatible APIs (including Qwen/DashScope) | ||
| // and returns the aggregated response with usage statistics and tool calls. | ||
| func parseOpenAIStream(reader io.Reader) (*request.VendorOpenAI, []request.ToolCall) { | ||
| scanner := bufio.NewScanner(reader) | ||
|
NameHaibinZhang marked this conversation as resolved.
NameHaibinZhang marked this conversation as resolved.
|
||
| scanner.Buffer(make([]byte, 0, 256*1024), 256*1024) | ||
| response := &request.VendorOpenAI{} | ||
|
|
||
| var finishReason string | ||
| var role string | ||
| var contentBuilder strings.Builder | ||
| // toolCallAccum accumulates tool call fragments by index. | ||
| type toolCallAccum struct { | ||
| id string | ||
| name string | ||
| } | ||
| var accumulators []toolCallAccum | ||
|
|
||
| for scanner.Scan() { | ||
| line := scanner.Text() | ||
|
|
||
| if !strings.HasPrefix(line, "data: ") { | ||
| continue | ||
| } | ||
|
|
||
| data := strings.TrimPrefix(line, "data: ") | ||
|
|
||
| if data == "[DONE]" { | ||
| break | ||
| } | ||
|
|
||
| var chunk openAIStreamChunk | ||
| if err := json.Unmarshal([]byte(data), &chunk); err != nil { | ||
| continue | ||
| } | ||
|
|
||
| // Extract model and id from the first chunk that has them. | ||
| if response.ID == "" && chunk.ID != "" { | ||
| response.ID = chunk.ID | ||
| } | ||
| if response.ResponseModel == "" && chunk.Model != "" { | ||
| response.ResponseModel = chunk.Model | ||
| } | ||
|
|
||
| // Extract usage from the chunk that contains it (typically the last one). | ||
| if chunk.Usage != nil { | ||
| response.Usage.PromptTokens = chunk.Usage.PromptTokens | ||
| response.Usage.CompletionTokens = chunk.Usage.CompletionTokens | ||
| response.Usage.TotalTokens = chunk.Usage.TotalTokens | ||
| response.Usage.InputTokens = chunk.Usage.InputTokens | ||
| response.Usage.OutputTokens = chunk.Usage.OutputTokens | ||
| } | ||
|
|
||
| // Process choices. | ||
| for i := range chunk.Choices { | ||
| choice := &chunk.Choices[i] | ||
|
|
||
| // Track finish reason from the last choice that reports one. | ||
| if choice.FinishReason != nil && *choice.FinishReason != "" { | ||
| finishReason = *choice.FinishReason | ||
| } | ||
|
|
||
| // Capture assistant role (typically in the first delta) and | ||
| // accumulate content fragments to reconstruct the full message. | ||
| if choice.Delta.Role != "" { | ||
| role = choice.Delta.Role | ||
| } | ||
| if choice.Delta.Content != "" { | ||
| contentBuilder.WriteString(choice.Delta.Content) | ||
| } | ||
|
|
||
| // Accumulate tool calls by index. | ||
| for j := range choice.Delta.ToolCalls { | ||
| tc := &choice.Delta.ToolCalls[j] | ||
| idx := tc.Index | ||
| if idx < 0 || idx >= maxStreamToolCalls { | ||
| continue | ||
| } | ||
|
|
||
| // Grow the accumulator slice as needed. | ||
| for len(accumulators) <= idx { | ||
| accumulators = append(accumulators, toolCallAccum{}) | ||
| } | ||
|
|
||
| if tc.ID != "" { | ||
| accumulators[idx].id = tc.ID | ||
| } | ||
| if tc.Function.Name != "" { | ||
| accumulators[idx].name = tc.Function.Name | ||
| } | ||
|
|
||
| } | ||
| } | ||
| } | ||
|
|
||
| if err := scanner.Err(); err != nil { | ||
| slog.Debug("parseOpenAIStream: scanner error", "error", err) | ||
| } | ||
|
|
||
| // Build the Choices JSON with the aggregated message content and | ||
| // finish_reason so that VendorOpenAI.GetFinishReasons() and the GenAI | ||
| // output normalization (normalizeOpenAIChoices) work correctly. | ||
| if finishReason != "" || contentBuilder.Len() > 0 { | ||
| type streamChoice struct { | ||
| Message struct { | ||
| Role string `json:"role"` | ||
| Content string `json:"content"` | ||
| } `json:"message"` | ||
| FinishReason string `json:"finish_reason"` | ||
| } | ||
|
|
||
| sc := streamChoice{FinishReason: finishReason} | ||
| sc.Message.Role = role | ||
| if sc.Message.Role == "" { | ||
| sc.Message.Role = "assistant" | ||
| } | ||
| sc.Message.Content = contentBuilder.String() | ||
|
|
||
| choicesJSON, err := json.Marshal([]streamChoice{sc}) | ||
| if err == nil { | ||
| response.Choices = choicesJSON | ||
| } | ||
| } | ||
|
|
||
| // Build the final tool calls list. | ||
| var toolCalls []request.ToolCall | ||
| for i := range accumulators { | ||
| if accumulators[i].name == "" { | ||
| continue | ||
| } | ||
| toolCalls = append(toolCalls, request.ToolCall{ | ||
|
NameHaibinZhang marked this conversation as resolved.
|
||
| ID: accumulators[i].id, | ||
| Name: accumulators[i].name, | ||
| }) | ||
| } | ||
|
|
||
| if response.Usage.GetInputTokens() == 0 && response.Usage.GetOutputTokens() == 0 && response.ID != "" { | ||
| slog.Debug("parseOpenAIStream: no usage data found in SSE stream, token counts will be 0", | ||
| "id", response.ID, "model", response.ResponseModel, "finishReason", finishReason) | ||
| } | ||
|
|
||
| return response, toolCalls | ||
| } | ||
Uh oh!
There was an error while loading. Please reload this page.