Summary
Prod Cloud API forwards Fusion non-streaming responses correctly, including nearai_fusion metadata, but streaming Fusion responses lose the nearai_fusion metadata event. The stream still returns content, [DONE], and aggregate usage, so the issue appears limited to forwarding/preserving the extra top-level Fusion metadata in SSE events.
Direct proxy streaming includes nearai_fusion; prod Cloud API streaming currently does not.
Environment
- Cloud API:
https://cloud-api.near.ai
- Direct GLM proxy:
https://glm-5-2.completions.near.ai
- Outer/synthesis model:
z-ai/glm-5.2
- Judge model:
z-ai/glm-5.2
- Panel model:
deepseek-ai/DeepSeek-V4-Flash
- Fusion tool type:
nearai:fusion
- Web tool:
web_context_search
No real API token is included in this issue. Use a disposable test token in $TOKEN.
Expected
For streaming Fusion responses through Cloud API, the terminal Fusion metadata event should preserve the proxy's nearai_fusion object, matching the direct proxy streaming behavior and the existing infra test expectation.
Expected terminal metadata shape includes:
{
"usage": {
"prompt_tokens": 4805,
"completion_tokens": 355,
"total_tokens": 5160
},
"nearai_fusion": {
"status": "invoked",
"panel": [{"status": "ok", "web_tool_calls": 1}],
"judge": {"status": "ok"},
"aggregate_usage": {
"prompt_tokens": 4805,
"completion_tokens": 355,
"total_tokens": 5160
}
}
}
Actual
Through prod Cloud API streaming:
- HTTP 200
- SSE stream completes with
[DONE]
- Final content is correct
- Aggregate
usage is present
nearai_fusion is missing from all stream events
Observed parsed summary:
{
"chunks": 4,
"done": true,
"content": "near.org",
"usage": {
"prompt_tokens": 4805,
"completion_tokens": 355,
"total_tokens": 5160
},
"fusion_status": null,
"panel_statuses": null,
"panel_web_tool_calls": null,
"judge_status": null,
"aggregate_usage": null,
"errors": []
}
Raw tail of a Cloud API stream for a simpler Fusion request shows only the usage event before [DONE]:
data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[{"index":0,"delta":{"role":"assistant"}}],"usage":null}
data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[{"index":0,"delta":{"content":"stream-inspect-ok"}}],"usage":null}
data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":null}
data: {"id":"99c85d7ba95c47b79149e84bcf2b7d49","object":"chat.completion.chunk","created":1781804308,"model":"z-ai/glm-5.2","choices":[],"usage":{"prompt_tokens":362,"completion_tokens":409,"total_tokens":771}}
data: [DONE]
Non-Streaming Control Passes
The same setup through prod Cloud API in non-streaming mode returns full Fusion metadata.
Repro:
curl --max-time 120 -sS \
https://cloud-api.near.ai/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"model":"z-ai/glm-5.2",
"messages":[{
"role":"user",
"content":"Use web_context_search if available, then answer with exactly the official NEAR Protocol website domain and no extra words."
}],
"tools":[
{
"type":"nearai:fusion",
"analysis_models":["deepseek-ai/DeepSeek-V4-Flash"],
"model":"z-ai/glm-5.2",
"max_tool_calls":1,
"max_completion_tokens":768,
"temperature":0
},
{"type":"web_context_search"}
],
"tool_choice":"required",
"max_completion_tokens":768,
"temperature":0,
"stream":false
}' | jq '{content: .choices[0].message.content, fusion_status: .nearai_fusion.status, panel: .nearai_fusion.panel, judge: .nearai_fusion.judge, usage, aggregate_usage: .nearai_fusion.aggregate_usage}'
Observed summary:
{
"content": "near.org",
"fusion_status": "invoked",
"panel": [{
"domain": "dsv4-flash.completions.near.ai",
"model": "deepseek-ai/DeepSeek-V4-Flash",
"status": "ok",
"web_tool_calls": 1,
"verifiable": true
}],
"judge": {
"domain": "glm-5-2.completions.near.ai",
"model": "z-ai/glm-5.2",
"status": "ok",
"verifiable": true
},
"usage": {
"completion_tokens": 362,
"prompt_tokens": 4799,
"total_tokens": 5161
},
"aggregate_usage": {
"completion_tokens": 362,
"prompt_tokens": 4799,
"total_tokens": 5161
}
}
Streaming Repro Through Cloud API
curl --max-time 120 -sS -N \
https://cloud-api.near.ai/v1/chat/completions \
-H "Authorization: Bearer $TOKEN" \
-H 'Content-Type: application/json' \
-d '{
"model":"z-ai/glm-5.2",
"messages":[{
"role":"user",
"content":"Use web_context_search if available, then answer with exactly the official NEAR Protocol website domain and no extra words."
}],
"tools":[
{
"type":"nearai:fusion",
"analysis_models":["deepseek-ai/DeepSeek-V4-Flash"],
"model":"z-ai/glm-5.2",
"max_tool_calls":1,
"max_completion_tokens":768,
"temperature":0
},
{"type":"web_context_search"}
],
"tool_choice":"required",
"max_completion_tokens":768,
"temperature":0,
"stream":true,
"stream_options":{"include_usage":true}
}'
Parse the SSE events and check whether any event has a top-level nearai_fusion key. Current prod result: none do.
Existing Test Expectation
infra-tests/tests/test_fusion.py already expects streaming Fusion metadata through Cloud API:
fusion_event = next((event for event in events if event.get("nearai_fusion")), None)
assert fusion_event is not None, f"missing streaming Fusion metadata event: {events[-3:]}"
assert fusion_event.get("usage"), f"Fusion metadata event missing usage: {fusion_event}"
_assert_fusion_metadata(fusion_event)
This production behavior would fail that assertion.
Notes / Likely Area
Inference-proxy direct streaming emits Fusion metadata, and Cloud API non-streaming preserves it. The likely issue is in Cloud API's streaming passthrough/reassembly path: it preserves standard SSE fields and aggregate usage, but drops unknown top-level fields such as nearai_fusion from the terminal event.
Cloud API should remain a passthrough for Fusion semantics; it does not need to understand Fusion, but it should preserve unknown top-level JSON fields in streamed SSE events.
Summary
Prod Cloud API forwards Fusion non-streaming responses correctly, including
nearai_fusionmetadata, but streaming Fusion responses lose thenearai_fusionmetadata event. The stream still returns content,[DONE], and aggregateusage, so the issue appears limited to forwarding/preserving the extra top-level Fusion metadata in SSE events.Direct proxy streaming includes
nearai_fusion; prod Cloud API streaming currently does not.Environment
https://cloud-api.near.aihttps://glm-5-2.completions.near.aiz-ai/glm-5.2z-ai/glm-5.2deepseek-ai/DeepSeek-V4-Flashnearai:fusionweb_context_searchNo real API token is included in this issue. Use a disposable test token in
$TOKEN.Expected
For streaming Fusion responses through Cloud API, the terminal Fusion metadata event should preserve the proxy's
nearai_fusionobject, matching the direct proxy streaming behavior and the existing infra test expectation.Expected terminal metadata shape includes:
{ "usage": { "prompt_tokens": 4805, "completion_tokens": 355, "total_tokens": 5160 }, "nearai_fusion": { "status": "invoked", "panel": [{"status": "ok", "web_tool_calls": 1}], "judge": {"status": "ok"}, "aggregate_usage": { "prompt_tokens": 4805, "completion_tokens": 355, "total_tokens": 5160 } } }Actual
Through prod Cloud API streaming:
[DONE]usageis presentnearai_fusionis missing from all stream eventsObserved parsed summary:
{ "chunks": 4, "done": true, "content": "near.org", "usage": { "prompt_tokens": 4805, "completion_tokens": 355, "total_tokens": 5160 }, "fusion_status": null, "panel_statuses": null, "panel_web_tool_calls": null, "judge_status": null, "aggregate_usage": null, "errors": [] }Raw tail of a Cloud API stream for a simpler Fusion request shows only the usage event before
[DONE]:Non-Streaming Control Passes
The same setup through prod Cloud API in non-streaming mode returns full Fusion metadata.
Repro:
Observed summary:
{ "content": "near.org", "fusion_status": "invoked", "panel": [{ "domain": "dsv4-flash.completions.near.ai", "model": "deepseek-ai/DeepSeek-V4-Flash", "status": "ok", "web_tool_calls": 1, "verifiable": true }], "judge": { "domain": "glm-5-2.completions.near.ai", "model": "z-ai/glm-5.2", "status": "ok", "verifiable": true }, "usage": { "completion_tokens": 362, "prompt_tokens": 4799, "total_tokens": 5161 }, "aggregate_usage": { "completion_tokens": 362, "prompt_tokens": 4799, "total_tokens": 5161 } }Streaming Repro Through Cloud API
Parse the SSE events and check whether any event has a top-level
nearai_fusionkey. Current prod result: none do.Existing Test Expectation
infra-tests/tests/test_fusion.pyalready expects streaming Fusion metadata through Cloud API:This production behavior would fail that assertion.
Notes / Likely Area
Inference-proxy direct streaming emits Fusion metadata, and Cloud API non-streaming preserves it. The likely issue is in Cloud API's streaming passthrough/reassembly path: it preserves standard SSE fields and aggregate
usage, but drops unknown top-level fields such asnearai_fusionfrom the terminal event.Cloud API should remain a passthrough for Fusion semantics; it does not need to understand Fusion, but it should preserve unknown top-level JSON fields in streamed SSE events.