Skip to content

Cloud API streaming drops nearai_fusion metadata from Fusion SSE events #830

@PierreLeGuen

Description

@PierreLeGuen

Summary

Prod Cloud API forwards Fusion non-streaming responses correctly, including nearai_fusion metadata, but streaming Fusion responses lose the nearai_fusion metadata event. The stream still returns content, [DONE], and aggregate usage, so the issue appears limited to forwarding/preserving the extra top-level Fusion metadata in SSE events.

Direct proxy streaming includes nearai_fusion; prod Cloud API streaming currently does not.

Environment

  • Cloud API: https://cloud-api.near.ai
  • Direct GLM proxy: https://glm-5-2.completions.near.ai
  • Outer/synthesis model: z-ai/glm-5.2
  • Judge model: z-ai/glm-5.2
  • Panel model: deepseek-ai/DeepSeek-V4-Flash
  • Fusion tool type: nearai:fusion
  • Web tool: web_context_search

No real API token is included in this issue. Use a disposable test token in $TOKEN.

Expected

For streaming Fusion responses through Cloud API, the terminal Fusion metadata event should preserve the proxy's nearai_fusion object, matching the direct proxy streaming behavior and the existing infra test expectation.

Expected terminal metadata shape includes:

{
  "usage": {
    "prompt_tokens": 4805,
    "completion_tokens": 355,
    "total_tokens": 5160
  },
  "nearai_fusion": {
    "status": "invoked",
    "panel": [{"status": "ok", "web_tool_calls": 1}],
    "judge": {"status": "ok"},
    "aggregate_usage": {
      "prompt_tokens": 4805,
      "completion_tokens": 355,
      "total_tokens": 5160
    }
  }
}

Actual

Through prod Cloud API streaming:

  • HTTP 200
  • SSE stream completes with [DONE]
  • Final content is correct
  • Aggregate usage is present
  • nearai_fusion is missing from all stream events

Observed parsed summary:

{
  "chunks": 4,
  "done": true,
  "content": "near.org",
  "usage": {
    "prompt_tokens": 4805,
    "completion_tokens": 355,
    "total_tokens": 5160
  },
  "fusion_status": null,
  "panel_statuses": null,
  "panel_web_tool_calls": null,
  "judge_status": null,
  "aggregate_usage": null,
  "errors": []
}

Raw tail of a Cloud API stream for a simpler Fusion request shows only the usage event before [DONE]:

data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[{"index":0,"delta":{"role":"assistant"}}],"usage":null}

data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[{"index":0,"delta":{"content":"stream-inspect-ok"}}],"usage":null}

data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":null}

data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[],"usage":{"prompt_tokens":362,"completion_tokens":409,"total_tokens":771}}

data: [DONE]

Non-Streaming Control Passes

The same setup through prod Cloud API in non-streaming mode returns full Fusion metadata.

Repro:

curl --max-time 120 -sS \
  https://cloud-api.near.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "model":"z-ai/glm-5.2",
    "messages":[{
      "role":"user",
      "content":"Use web_context_search if available, then answer with exactly the official NEAR Protocol website domain and no extra words."
    }],
    "tools":[
      {
        "type":"nearai:fusion",
        "analysis_models":["deepseek-ai/DeepSeek-V4-Flash"],
        "model":"z-ai/glm-5.2",
        "max_tool_calls":1,
        "max_completion_tokens":768,
        "temperature":0
      },
      {"type":"web_context_search"}
    ],
    "tool_choice":"required",
    "max_completion_tokens":768,
    "temperature":0,
    "stream":false
  }' | jq '{content: .choices[0].message.content, fusion_status: .nearai_fusion.status, panel: .nearai_fusion.panel, judge: .nearai_fusion.judge, usage, aggregate_usage: .nearai_fusion.aggregate_usage}'

Observed summary:

{
  "content": "near.org",
  "fusion_status": "invoked",
  "panel": [{
    "domain": "dsv4-flash.completions.near.ai",
    "model": "deepseek-ai/DeepSeek-V4-Flash",
    "status": "ok",
    "web_tool_calls": 1,
    "verifiable": true
  }],
  "judge": {
    "domain": "glm-5-2.completions.near.ai",
    "model": "z-ai/glm-5.2",
    "status": "ok",
    "verifiable": true
  },
  "usage": {
    "completion_tokens": 362,
    "prompt_tokens": 4799,
    "total_tokens": 5161
  },
  "aggregate_usage": {
    "completion_tokens": 362,
    "prompt_tokens": 4799,
    "total_tokens": 5161
  }
}

Streaming Repro Through Cloud API

curl --max-time 120 -sS -N \
  https://cloud-api.near.ai/v1/chat/completions \
  -H "Authorization: Bearer $TOKEN" \
  -H 'Content-Type: application/json' \
  -d '{
    "model":"z-ai/glm-5.2",
    "messages":[{
      "role":"user",
      "content":"Use web_context_search if available, then answer with exactly the official NEAR Protocol website domain and no extra words."
    }],
    "tools":[
      {
        "type":"nearai:fusion",
        "analysis_models":["deepseek-ai/DeepSeek-V4-Flash"],
        "model":"z-ai/glm-5.2",
        "max_tool_calls":1,
        "max_completion_tokens":768,
        "temperature":0
      },
      {"type":"web_context_search"}
    ],
    "tool_choice":"required",
    "max_completion_tokens":768,
    "temperature":0,
    "stream":true,
    "stream_options":{"include_usage":true}
  }'

Parse the SSE events and check whether any event has a top-level nearai_fusion key. Current prod result: none do.

Existing Test Expectation

infra-tests/tests/test_fusion.py already expects streaming Fusion metadata through Cloud API:

fusion_event = next((event for event in events if event.get("nearai_fusion")), None)
assert fusion_event is not None, f"missing streaming Fusion metadata event: {events[-3:]}"
assert fusion_event.get("usage"), f"Fusion metadata event missing usage: {fusion_event}"
_assert_fusion_metadata(fusion_event)

This production behavior would fail that assertion.

Notes / Likely Area

Inference-proxy direct streaming emits Fusion metadata, and Cloud API non-streaming preserves it. The likely issue is in Cloud API's streaming passthrough/reassembly path: it preserves standard SSE fields and aggregate usage, but drops unknown top-level fields such as nearai_fusion from the terminal event.

Cloud API should remain a passthrough for Fusion semantics; it does not need to understand Fusion, but it should preserve unknown top-level JSON fields in streamed SSE events.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions