Skip to content

Bug: Streaming timeout causes wrong model name and zero tokens in usage logs #3846

@ChaoXu1997

Description

@ChaoXu1997

Summary

When a streaming response hits the 120-second idle timeout, the proxy_request_logs record stores the original request model (e.g. claude-opus-4-8) instead of the mapped model (e.g. glm-5.1). Token counts and cost are also zeroed out, making the UI show incorrect data.

What I see (broken)

Time Provider Model In Out Cache Cost Latency Status Source
06/07 20:28 Zhipu GLM claude-opus-4-8 0 0 $0.0000 129.1s 200 proxy

The UI shows only claude-opus-4-8 with no mapping arrow, zero tokens and zero cost.

What I expect (normal)

Time Provider Model In Out Cache Cost Latency Status Source
06/07 20:28 Zhipu GLM claude-opus-4-8 → glm-5.1 720 R140,736 226 $0.0386 8.9s 200 proxy

Steps to Reproduce

  1. Configure Claude Code to proxy through cc-switch with model mapping claude-opus-4-8glm-5.1 (Zhipu GLM)
  2. Start a long conversation request
  3. If the upstream provider goes silent for 120+ seconds after the first byte, the streaming idle timeout fires
  4. The proxy_request_logs record now has model = claude-opus-4-8 (same as request_model) and all tokens = 0

Root Cause Analysis

Call chain

  1. Claude Code sends request with model: claude-opus-4-8
  2. RequestContext stores request_model = "claude-opus-4-8" (original)
  3. Forwarder correctly maps to glm-5.1 and sends to Zhipu API ✅ — log shows [Claude] >>> 请求 URL: ... (model=glm-5.1)
  4. Zhipu API starts streaming, some SSE events are received (first_token_ms is set)
  5. 120-second idle timeout fires: [Claude] 流式响应静默期超时 (120秒)
  6. SseUsageCollector::finish() is called, but collected SSE events are incomplete (missing message_delta which carries usage)
  7. claude_model_extractor cannot extract the model name from incomplete events, falls back to request_model
  8. Database record: model = "claude-opus-4-8", tokens = 0, cost = $0.0000

Problematic code

In src-tauri/src/proxy/handler_config.rs, claude_model_extractor:

fn claude_model_extractor(events: &[Value], request_model: &str) -> String {
    if let Some(usage) = TokenUsage::from_claude_stream_events(events) {
        if let Some(model) = usage.model {
            return model;
        }
    }
    request_model.to_string()  // ← Bug: falls back to original model on timeout
}

When the stream is cut by timeout:

  • TokenUsage::from_claude_stream_events(events) returns None (no message_delta with usage)
  • Falls back to request_model = "claude-opus-4-8" instead of the actual mapped model glm-5.1

The same pattern exists in openai_model_extractor, codex_auto_model_extractor, and gemini_model_extractor.

Database evidence

SELECT datetime(created_at, "unixepoch", "localtime") as time,
       request_model, model, input_tokens, output_tokens, total_cost_usd, latency_ms, status_code
FROM proxy_request_logs
WHERE model = request_model AND input_tokens = 0 AND output_tokens = 0
ORDER BY created_at DESC LIMIT 5;
time request_model model in out cost latency status
20:34:37 claude-opus-4-8 claude-opus-4-8 0 0 0 120769ms 200
20:31:42 claude-opus-4-8 claude-opus-4-8 0 0 0 126104ms 200
20:28:51 claude-opus-4-8 claude-opus-4-8 0 0 0 129145ms 200
20:26:06 claude-opus-4-8 claude-opus-4-8 0 0 0 128027ms 200
20:23:29 claude-opus-4-8 claude-opus-4-8 0 0 0 39081ms 200

All broken records share: model == request_model, tokens = 0, latency > 120s, status = 200.

Corresponding log entries:

[2026-06-07][20:28:51][ERROR][cc_switch_lib::proxy::response_processor] [Claude] 流式响应静默期超时 (120秒)

Suggested Fix

Option A: Add intermediate fallback in model_extractor (recommended, minimal change)

Before falling back to request_model, try to extract the model from the message_start event:

fn claude_model_extractor(events: &[Value], request_model: &str) -> String {
    if let Some(usage) = TokenUsage::from_claude_stream_events(events) {
        if let Some(model) = usage.model {
            return model;
        }
    }
    // NEW: on timeout, try to extract actual model from message_start event
    events
        .iter()
        .find_map(|e| {
            if e.get("type")?.as_str()? == "message_start" {
                e.get("message")?.get("model")?.as_str()
            } else {
                None
            }
        })
        .unwrap_or(request_model)
        .to_string()
}

Option B: Store mapped model in RequestContext

Add a mapped_model field to RequestContext. The forwarder writes back the mapped model name after remapping, so the timeout handler can use it instead of the original.

Environment

  • cc-switch version: latest (as of 2026-06-07)
  • OS: Linux
  • Providers: Zhipu GLM (glm-5.1) / Xiaomi MiMo (mimo-v2.5-pro)
  • App type: claude (Claude Code)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions