Skip to content

[Bug]: Go SDK SSE streams time out, truncate large events, and return incomplete agent results #1738

Description

@coconut-yc

Affected Component

Other — weknora CLI + Go SDK client (cli/, client/)

Bug Description

The CLI's streaming commands and their underlying SDK streams are unreliable for non-trivial runs. Five defects in the client/streaming layer can abort a stream, leave it waiting indefinitely, return an incomplete agent result, or discard reference metadata:

  1. The default 30-second client timeout severs long streams. http.Client.Timeout covers reading the response body, so KnowledgeQAStream, ContinueStream, and AgentQAStreamWithRequest can be cut off mid-run. The default timeout should not apply to SSE; an explicitly configured WithTimeout should remain honored.
  2. SSE data lines above 64 KiB are rejected. A references event can contain hundreds of KiB of chunk content. The default bufio.Scanner token limit returns bufio.Scanner: token too long and aborts the stream.
  3. Terminal error frames can leave the client waiting. If the server sends response_type=error, done=true but leaves the HTTP connection open without a later complete event or EOF, the SDK continues reading indefinitely.
  4. Agent results finalize on a per-event marker. AgentAccumulator treats the first done:true as completion of the entire run. Thinking, reflection, and answer sub-streams each emit their own done:true, so later tool calls and the final answer are ignored. This affects session ask --format text and the MCP session_ask tool.
  5. Reference identity fields are silently dropped. The server includes knowledge_base_id, parent_chunk_id, and sub_chunk_id in reference events, but the SDK's SearchResult has no corresponding fields. Go therefore discards the KB and parent/sub-chunk relationships during unmarshal.

A separate feature request (#1739) covers the proposed agent-facing output modes and optional execution/reference detail. This issue is limited to stream reliability and response fidelity.

Steps to Reproduce

CLI v0.9.0, commit ae9038732ad2, built from source.

Default timeout

Run a response whose total streaming time exceeds 30 seconds:

weknora chat "<query whose response streams for more than 30 seconds>" \
  --kb <kb-id> --format text

# ReAct/tool-call path (`chat` itself sets AgentEnabled=false):
weknora session ask --agent <agent-id> --format text \
  "<multi-step query>"

The stream is severed before completion.

Large SSE line

Use a knowledge base whose returned references exceed 64 KiB:

weknora chat "<query returning several large chunks>" \
  --kb <kb-id> --format ndjson

The SDK aborts with bufio.Scanner: token too long.

Agent early finalization

weknora session ask --agent <agent-id> --format text \
  "<multi-step query that emits thinking, tool calls, and a final answer>"

The rendered result may stop after an intermediate done:true. The same behavior is observable through the MCP session_ask tool.

Dropped reference identity fields

weknora chat "<query>" --kb <kb-id> --format ndjson

The server's reference payload contains knowledge_base_id, parent_chunk_id, and sub_chunk_id, but those fields are absent after SDK unmarshal and CLI re-serialization.

Expected Behavior

  • Default clients allow SSE streams to run until completion or context cancellation; an explicit WithTimeout remains an upper bound.
  • Reference events of hundreds of KiB parse successfully.
  • A terminal error frame ends the SDK call even if the connection remains open.
  • Agent results include all tool events and the final answer, terminating only on response_type=complete.
  • knowledge_base_id, parent_chunk_id, and sub_chunk_id survive unmarshal so callers retain KB provenance and can fetch the self-contained parent passage.

Actual Behavior

  • Streams lasting more than 30 seconds are cut off.
  • Large reference events fail with bufio.Scanner: token too long.
  • A terminal error can leave the reader waiting for EOF or complete.
  • Agent results can omit later events and the final answer.
  • KB and parent/sub-chunk metadata is lost.

WeKnora Version

weknora CLI v0.9.0, commit ae9038732ad2, built from source.

Deployment Method

Build from source

Operating System

Rocky Linux 8.10

Relevant Logs

context deadline exceeded (Client.Timeout exceeded while reading body)
bufio.Scanner: token too long

Confirmation

  • I have searched existing issues and confirmed this is a new one

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions