Skip to content

Qwen 3.6 context window is 32k, OpenRouter reports 262k for the same model #41

@Evrard-Nil

Description

@Evrard-Nil

Source: @PierreLeGuen in #private-inference, 2026-05-20 18:09 CEST.

Problem

Our deployed Qwen3.6-35B-A3B-FP8 advertises a 32k context window. OpenRouter lists the same upstream model at 262k: https://openrouter.ai/qwen/qwen3.6-35b-a3b

Users on our endpoint get a much smaller usable context than competing providers for the same model.

Investigation

The current Qwen 3.6 config (cvm-compose-files, deployed v0.0.163 with sglang) is set to 32k somewhere — likely a sglang/vllm --max-model-len arg in the compose file, or a kv-cache/VRAM constraint forcing it down.

Tasks

  • Confirm where the 32k limit is set (compose file, sglang flag, KV-cache budget)
  • Decide target context window — 262k would need significantly more KV-cache GPU memory; intermediate values (128k, 64k) may be a reasonable trade-off
  • Bump and redeploy
  • Update /v1/models advertised context_length

cc @PierreLeGuen — flagged this in #private-inference; want to confirm whether it's already on your plate or if this issue is the right home for it.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions