Source: @PierreLeGuen in #private-inference, 2026-05-20 18:09 CEST.
Problem
Our deployed Qwen3.6-35B-A3B-FP8 advertises a 32k context window. OpenRouter lists the same upstream model at 262k: https://openrouter.ai/qwen/qwen3.6-35b-a3b
Users on our endpoint get a much smaller usable context than competing providers for the same model.
Investigation
The current Qwen 3.6 config (cvm-compose-files, deployed v0.0.163 with sglang) is set to 32k somewhere — likely a sglang/vllm --max-model-len arg in the compose file, or a kv-cache/VRAM constraint forcing it down.
Tasks
cc @PierreLeGuen — flagged this in #private-inference; want to confirm whether it's already on your plate or if this issue is the right home for it.
Source: @PierreLeGuen in #private-inference, 2026-05-20 18:09 CEST.
Problem
Our deployed Qwen3.6-35B-A3B-FP8 advertises a 32k context window. OpenRouter lists the same upstream model at 262k: https://openrouter.ai/qwen/qwen3.6-35b-a3b
Users on our endpoint get a much smaller usable context than competing providers for the same model.
Investigation
The current Qwen 3.6 config (cvm-compose-files, deployed v0.0.163 with sglang) is set to 32k somewhere — likely a sglang/vllm
--max-model-lenarg in the compose file, or a kv-cache/VRAM constraint forcing it down.Tasks
/v1/modelsadvertisedcontext_lengthcc @PierreLeGuen — flagged this in #private-inference; want to confirm whether it's already on your plate or if this issue is the right home for it.