[Bugfix] Fix missing tie_word_embeddings on Qwen3-VL text_config#330
[Bugfix] Fix missing tie_word_embeddings on Qwen3-VL text_config#330Lidang-Jiang wants to merge 2 commits into
Conversation
Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>
There was a problem hiding this comment.
Pull request overview
This PR fixes a Qwen3-VL startup crash on Kunlun caused by hf_config.text_config missing tie_word_embeddings, by patching the config during KunlunPlatform.check_and_update_config() and adding regression tests.
Changes:
- Add Qwen3-VL config detection + patching logic to populate
text_config.tie_word_embeddings. - Invoke the patch from
KunlunPlatform.check_and_update_config(). - Add unit tests for inheritance/preservation/no-op behavior.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
vllm_kunlun/platforms/kunlun.py |
Adds helpers to detect Qwen3-VL configs and patch text_config.tie_word_embeddings during config normalization. |
tests/ut/test.py |
Adds regression tests ensuring the patch is applied only for Qwen3-VL and does not overwrite existing values. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| text_config = getattr(hf_config, "text_config", None) | ||
| if text_config is None or hasattr(text_config, "tie_word_embeddings"): | ||
| return | ||
|
|
||
| text_config.tie_word_embeddings = getattr(hf_config, "tie_word_embeddings", False) |
There was a problem hiding this comment.
Good catch, fixed in 8a46ecb. The patch now only copies tie_word_embeddings when the top-level config explicitly defines it.
| def _is_qwen3_vl_config(hf_config) -> bool: | ||
| config_type = type(hf_config).__name__ | ||
| architectures = getattr(hf_config, "architectures", None) or () | ||
| if isinstance(architectures, str): | ||
| architectures = (architectures,) | ||
|
|
||
| return config_type == "Qwen3VLConfig" or any( | ||
| architecture in _QWEN3_VL_ARCHITECTURES for architecture in architectures | ||
| ) |
There was a problem hiding this comment.
Fixed in 8a46ecb. I added targeted regression coverage for the string architectures path, the Qwen3VLConfig type-name path, and the missing top-level field case.
Signed-off-by: Lidang Jiang <lidangjiang@gmail.com>
PR Description
FIX #306
Summary
Patch
KunlunPlatform.check_and_update_config()to populatetie_word_embeddingson Qwen3-VLtext_configwhen the field only exists on the top-level HuggingFace config.Add regression tests covering:
tie_word_embeddingsfrom the top-level config for Qwen3-VLtext_config.tie_word_embeddingsChecklist (Required)
pre-commitchecks.git commit -s.Before
After
curl -sS http://127.0.0.1:8573/v1/models{"object":"list","data":[{"id":"Qwen3-VL-32B-Instruct-INT8-Dynamic","object":"model","created":1776672092,"owned_by":"vllm","root":"/ssd1/models/Qwen3-VL-32B-Instruct-INT8-Dynamic","parent":null,"max_model_len":32768,"permission":[{"id":"modelperm-9a4cc8ca4c9a3311","object":"model_permission","created":1776672092,"allow_create_engine":false,"allow_sampling":true,"allow_logprobs":true,"allow_search_indices":false,"allow_view":true,"allow_fine_tuning":false,"organization":"*","group":null,"is_blocking":false}]}]}curl -sS -X POST http://127.0.0.1:8573/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"Qwen3-VL-32B-Instruct-INT8-Dynamic","messages":[{"role":"user","content":"请用两句话介绍你自己,并说明你现在可以正常回答问题。"}],"temperature":0,"max_tokens":120}'{"id":"chatcmpl-94c8c39d7cf04ef9","object":"chat.completion","created":1776672092,"model":"Qwen3-VL-32B-Instruct-INT8-Dynamic","choices":[{"index":0,"message":{"role":"assistant","content":"你好!我是一款超大的预训练的语言生成类智能助手(通称:通识智能助手),擅长理解与生成自然流畅的文本内容,在多个领域提供帮助与支持;我目前可以正常接收并解答各种问题,请尽情提问!","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning":null,"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":22,"total_tokens":76,"completion_tokens":54,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null}Full Log Files
Full after log files are uploaded here:
https://gist.github.com/Lidang-Jiang/49b07508a5afdb41975a587551267c31
Files:
pr330_output_qwen3_vl_32b_instruct_int8_dynamic.logpr330_models_response.jsonpr330_chat_response.jsonTest plan
pre-commit run --files tests/ut/test.py vllm_kunlun/platforms/kunlun.pysource /root/miniconda/etc/profile.d/conda.shconda activate /ssd1/jianglidang/workspace/python310_torch25_cuda_main0151cd /tmp/vllm-kunlun-pr330-mUEsRmsource ./setup_env.shexport VLLM_USE_V1=1export USE_ORI_ROPE=1export XPU_VISIBLE_DEVICES=0export CUDA_VISIBLE_DEVICES=0export LD_LIBRARY_PATH="$CONDA_PREFIX/xcudart/lib:${LD_LIBRARY_PATH:-}"export TORCHDYNAMO_SUPPRESS_ERRORS=1python setup.py build_extpython -m pytest tests/ut/test.py -q -k 'qwen3_vl_text_config or non_qwen3_vl'python -u -m vllm.entrypoints.openai.api_server --host 127.0.0.1 --port 8573 --model /ssd1/models/Qwen3-VL-32B-Instruct-INT8-Dynamic --gpu-memory-utilization 0.9 --trust-remote-code --max-model-len 32768 --tensor-parallel-size 1 --dtype float16 --max_num_seqs 128 --max_num_batched_tokens 32768 --block-size 128 --no-enable-prefix-caching --no-enable-chunked-prefill --distributed-executor-backend mp --served-model-name Qwen3-VL-32B-Instruct-INT8-Dynamiccurl -sS http://127.0.0.1:8573/v1/modelscurl -sS -X POST http://127.0.0.1:8573/v1/chat/completions -H "Content-Type: application/json" -d '{"model":"Qwen3-VL-32B-Instruct-INT8-Dynamic","messages":[{"role":"user","content":"请用两句话介绍你自己,并说明你现在可以正常回答问题。"}],"temperature":0,"max_tokens":120}'