Summary
When a streaming response hits the 120-second idle timeout, the proxy_request_logs record stores the original request model (e.g. claude-opus-4-8) instead of the mapped model (e.g. glm-5.1). Token counts and cost are also zeroed out, making the UI show incorrect data.
What I see (broken)
| Time |
Provider |
Model |
In |
Out |
Cache |
Cost |
Latency |
Status |
Source |
| 06/07 20:28 |
Zhipu GLM |
claude-opus-4-8 |
0 |
0 |
— |
$0.0000 |
129.1s |
200 |
proxy |
The UI shows only claude-opus-4-8 with no mapping arrow, zero tokens and zero cost.
What I expect (normal)
| Time |
Provider |
Model |
In |
Out |
Cache |
Cost |
Latency |
Status |
Source |
| 06/07 20:28 |
Zhipu GLM |
claude-opus-4-8 → glm-5.1 |
720 |
R140,736 |
226 |
$0.0386 |
8.9s |
200 |
proxy |
Steps to Reproduce
- Configure Claude Code to proxy through cc-switch with model mapping
claude-opus-4-8 → glm-5.1 (Zhipu GLM)
- Start a long conversation request
- If the upstream provider goes silent for 120+ seconds after the first byte, the streaming idle timeout fires
- The
proxy_request_logs record now has model = claude-opus-4-8 (same as request_model) and all tokens = 0
Root Cause Analysis
Call chain
- Claude Code sends request with
model: claude-opus-4-8
RequestContext stores request_model = "claude-opus-4-8" (original)
- Forwarder correctly maps to
glm-5.1 and sends to Zhipu API ✅ — log shows [Claude] >>> 请求 URL: ... (model=glm-5.1)
- Zhipu API starts streaming, some SSE events are received (
first_token_ms is set)
- 120-second idle timeout fires:
[Claude] 流式响应静默期超时 (120秒)
SseUsageCollector::finish() is called, but collected SSE events are incomplete (missing message_delta which carries usage)
claude_model_extractor cannot extract the model name from incomplete events, falls back to request_model
- Database record:
model = "claude-opus-4-8", tokens = 0, cost = $0.0000
Problematic code
In src-tauri/src/proxy/handler_config.rs, claude_model_extractor:
fn claude_model_extractor(events: &[Value], request_model: &str) -> String {
if let Some(usage) = TokenUsage::from_claude_stream_events(events) {
if let Some(model) = usage.model {
return model;
}
}
request_model.to_string() // ← Bug: falls back to original model on timeout
}
When the stream is cut by timeout:
TokenUsage::from_claude_stream_events(events) returns None (no message_delta with usage)
- Falls back to
request_model = "claude-opus-4-8" instead of the actual mapped model glm-5.1
The same pattern exists in openai_model_extractor, codex_auto_model_extractor, and gemini_model_extractor.
Database evidence
SELECT datetime(created_at, "unixepoch", "localtime") as time,
request_model, model, input_tokens, output_tokens, total_cost_usd, latency_ms, status_code
FROM proxy_request_logs
WHERE model = request_model AND input_tokens = 0 AND output_tokens = 0
ORDER BY created_at DESC LIMIT 5;
| time |
request_model |
model |
in |
out |
cost |
latency |
status |
| 20:34:37 |
claude-opus-4-8 |
claude-opus-4-8 |
0 |
0 |
0 |
120769ms |
200 |
| 20:31:42 |
claude-opus-4-8 |
claude-opus-4-8 |
0 |
0 |
0 |
126104ms |
200 |
| 20:28:51 |
claude-opus-4-8 |
claude-opus-4-8 |
0 |
0 |
0 |
129145ms |
200 |
| 20:26:06 |
claude-opus-4-8 |
claude-opus-4-8 |
0 |
0 |
0 |
128027ms |
200 |
| 20:23:29 |
claude-opus-4-8 |
claude-opus-4-8 |
0 |
0 |
0 |
39081ms |
200 |
All broken records share: model == request_model, tokens = 0, latency > 120s, status = 200.
Corresponding log entries:
[2026-06-07][20:28:51][ERROR][cc_switch_lib::proxy::response_processor] [Claude] 流式响应静默期超时 (120秒)
Suggested Fix
Option A: Add intermediate fallback in model_extractor (recommended, minimal change)
Before falling back to request_model, try to extract the model from the message_start event:
fn claude_model_extractor(events: &[Value], request_model: &str) -> String {
if let Some(usage) = TokenUsage::from_claude_stream_events(events) {
if let Some(model) = usage.model {
return model;
}
}
// NEW: on timeout, try to extract actual model from message_start event
events
.iter()
.find_map(|e| {
if e.get("type")?.as_str()? == "message_start" {
e.get("message")?.get("model")?.as_str()
} else {
None
}
})
.unwrap_or(request_model)
.to_string()
}
Option B: Store mapped model in RequestContext
Add a mapped_model field to RequestContext. The forwarder writes back the mapped model name after remapping, so the timeout handler can use it instead of the original.
Environment
- cc-switch version: latest (as of 2026-06-07)
- OS: Linux
- Providers: Zhipu GLM (glm-5.1) / Xiaomi MiMo (mimo-v2.5-pro)
- App type: claude (Claude Code)
Summary
When a streaming response hits the 120-second idle timeout, the
proxy_request_logsrecord stores the original request model (e.g.claude-opus-4-8) instead of the mapped model (e.g.glm-5.1). Token counts and cost are also zeroed out, making the UI show incorrect data.What I see (broken)
The UI shows only
claude-opus-4-8with no mapping arrow, zero tokens and zero cost.What I expect (normal)
Steps to Reproduce
claude-opus-4-8→glm-5.1(Zhipu GLM)proxy_request_logsrecord now hasmodel = claude-opus-4-8(same asrequest_model) and all tokens = 0Root Cause Analysis
Call chain
model: claude-opus-4-8RequestContextstoresrequest_model = "claude-opus-4-8"(original)glm-5.1and sends to Zhipu API ✅ — log shows[Claude] >>> 请求 URL: ... (model=glm-5.1)first_token_msis set)[Claude] 流式响应静默期超时 (120秒)SseUsageCollector::finish()is called, but collected SSE events are incomplete (missingmessage_deltawhich carries usage)claude_model_extractorcannot extract the model name from incomplete events, falls back torequest_modelmodel = "claude-opus-4-8", tokens = 0, cost = $0.0000Problematic code
In
src-tauri/src/proxy/handler_config.rs,claude_model_extractor:When the stream is cut by timeout:
TokenUsage::from_claude_stream_events(events)returnsNone(nomessage_deltawith usage)request_model = "claude-opus-4-8"instead of the actual mapped modelglm-5.1The same pattern exists in
openai_model_extractor,codex_auto_model_extractor, andgemini_model_extractor.Database evidence
All broken records share:
model == request_model, tokens = 0, latency > 120s, status = 200.Corresponding log entries:
Suggested Fix
Option A: Add intermediate fallback in
model_extractor(recommended, minimal change)Before falling back to
request_model, try to extract the model from themessage_startevent:Option B: Store mapped model in
RequestContextAdd a
mapped_modelfield toRequestContext. The forwarder writes back the mapped model name after remapping, so the timeout handler can use it instead of the original.Environment