Qwen 3.6 context window is 32k, OpenRouter reports 262k for the same model

Source: @PierreLeGuen in #private-inference, 2026-05-20 18:09 CEST.

## Problem

Our deployed Qwen3.6-35B-A3B-FP8 advertises a 32k context window. OpenRouter lists the same upstream model at 262k: <https://openrouter.ai/qwen/qwen3.6-35b-a3b>

Users on our endpoint get a much smaller usable context than competing providers for the same model.

## Investigation

The current Qwen 3.6 config (cvm-compose-files, deployed v0.0.163 with sglang) is set to 32k somewhere — likely a sglang/vllm `--max-model-len` arg in the compose file, or a kv-cache/VRAM constraint forcing it down.

## Tasks

- [ ] Confirm where the 32k limit is set (compose file, sglang flag, KV-cache budget)
- [ ] Decide target context window — 262k would need significantly more KV-cache GPU memory; intermediate values (128k, 64k) may be a reasonable trade-off
- [ ] Bump and redeploy
- [ ] Update `/v1/models` advertised `context_length`

cc @PierreLeGuen — flagged this in #private-inference; want to confirm whether it's already on your plate or if this issue is the right home for it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen 3.6 context window is 32k, OpenRouter reports 262k for the same model #41

Problem

Investigation

Tasks

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Qwen 3.6 context window is 32k, OpenRouter reports 262k for the same model #41

Description

Problem

Investigation

Tasks

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions