Skip to content

qw使用vllm 0.17.0 A100部署qwen3.6 27B调用v1/messages接口,返回体只有thinking #162

@huanghuan3

Description

@huanghuan3

Description

使用vllm 0.17.0 A100部署qwen3.6 27B调用v1/messages接口,返回体只有thinking

Reproduction

export CUDA_VISIBLE_DEVICES=4,5
python -m vllm.entrypoints.openai.api_server --host 127.0.0.1--port 8042 --model /data1/Qwen3.6-27B --served-model-name GTSLLM-Standard --data-parallel-size 1 --tensor-parallel-size 2 --max-model-len 162144 --max-num-seqs 8 --gpu-memory-utilization 0.90 --trust-remote-code --compilation_config '{"cudagraph_mode":"FULL_DECODE_ONLY","cudagraph_capture_sizes":[1,2,4,8,16,32]}' --additional-config '{"enable_cpu_binding":true}' --async-scheduling --chat_template /data1/Qwen3.6-27B/chat_template.jinja --enable-auto-tool-choice --tool-call-parser "qwen3_coder" --reasoning-parser "qwen3" --no-enable-prefix-caching --no-enable-chunked-prefill --mm-processor-cache-gb 0 --mamba-cache-mode align

Logs

Environment Information

vllm 0.17.0 NVIDIA A100

Known Issue

  • The issue hasn't been already addressed in Documentation, Issues, and Discussions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions