qw使用vllm 0.17.0 A100部署qwen3.6 27B调用v1/messages接口，返回体只有thinking

### Description

使用vllm 0.17.0 A100部署qwen3.6 27B调用v1/messages接口，返回体只有thinking

### Reproduction

export CUDA_VISIBLE_DEVICES=4,5
python -m vllm.entrypoints.openai.api_server --host 127.0.0.1--port 8042 --model /data1/Qwen3.6-27B --served-model-name GTSLLM-Standard --data-parallel-size 1 --tensor-parallel-size 2 --max-model-len 162144 --max-num-seqs 8 --gpu-memory-utilization 0.90 --trust-remote-code --compilation_config '{"cudagraph_mode":"FULL_DECODE_ONLY","cudagraph_capture_sizes":[1,2,4,8,16,32]}' --additional-config '{"enable_cpu_binding":true}' --async-scheduling --chat_template /data1/Qwen3.6-27B/chat_template.jinja --enable-auto-tool-choice --tool-call-parser "qwen3_coder" --reasoning-parser "qwen3" --no-enable-prefix-caching --no-enable-chunked-prefill --mm-processor-cache-gb 0 --mamba-cache-mode align



### Logs

```shell

```

### Environment Information

vllm 0.17.0 NVIDIA  A100 

### Known Issue

- [x] The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qw使用vllm 0.17.0 A100部署qwen3.6 27B调用v1/messages接口，返回体只有thinking #162

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

qw使用vllm 0.17.0 A100部署qwen3.6 27B调用v1/messages接口，返回体只有thinking #162

Description

Description

Reproduction

Logs

Environment Information

Known Issue

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions