-
Notifications
You must be signed in to change notification settings - Fork 10
Description
I copy the qwen3-32b startup script and tried to start Qwen-Audio-Chat with the following parameter.
`# cat Qwen-Audio-Chat-startup-script.sh
#!/bin/bash
currentTime=$(date "+%Y%m%d-%H%M%S")
model="/llm/models/Qwen-Audio-Chat"
served_model_name="Qwen-Audio-Chat"
export TORCH_LLM_ALLREDUCE=1
export VLLM_USE_V1=1
export CCL_ZE_IPC_EXCHANGE=pidfd
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
#mkdir LOG
#ps -ef|grep vllm |awk '{print $2}'|xargs kill -9
#ps -ef|grep multiprocessing |awk '{print $2}'|xargs kill -9
python3 -m vllm.entrypoints.openai.api_server
--model $model
--served-model-name $served_model_name
--enforce-eager
--port 8003
--host 0.0.0.0
--api-key intel123
--trust-remote-code
--disable-sliding-window
--gpu-memory-util=0.5
--disable-log-requests
-tp=2 `
Got the following exception:
(VllmWorker rank=0 pid=1111) INFO 10-30 15:56:16 [gpu_model_runner.py:1837] Starting to load model /llm/models/Qwen-Audio-Chat...
(VllmWorker rank=1 pid=1112) INFO 10-30 15:56:16 [gpu_model_runner.py:1837] Starting to load model /llm/models/Qwen-Audio-Chat...
(VllmWorker rank=0 pid=1111) INFO 10-30 15:56:17 [gpu_model_runner.py:1869] Loading model from scratch...
(VllmWorker rank=1 pid=1112) INFO 10-30 15:56:17 [gpu_model_runner.py:1869] Loading model from scratch...
(VllmWorker rank=0 pid=1111) INFO 10-30 15:56:17 [xpu.py:63] Using Flash Attention backend on V1 engine.
(VllmWorker rank=1 pid=1112) INFO 10-30 15:56:17 [xpu.py:63] Using Flash Attention backend on V1 engine.
Loading safetensors checkpoint shards: 0% Completed | 0/9 [00:00<?, ?it/s]
Loading safetensors checkpoint shards: 11% Completed | 1/9 [00:00<00:05, 1.37it/s]
Loading safetensors checkpoint shards: 22% Completed | 2/9 [00:01<00:03, 2.13it/s]
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] WorkerProc failed to start.
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] Traceback (most recent call last):
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/executor/multiproc_executor.py", line 485, in worker_main
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] worker = WorkerProc(*args, **kwargs)
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/executor/multiproc_executor.py", line 382, in init
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] self.worker.load_model()
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/worker/gpu_worker.py", line 201, in load_model
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] self.model_runner.load_model(eep_scale_up=eep_scale_up)
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/worker/gpu_model_runner.py", line 1870, in load_model
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] self.model = model_loader.load_model(
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] ^^^^^^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/model_executor/model_loader/base_loader.py", line 49, in load_model
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] self.load_weights(model, model_config)
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/model_executor/model_loader/default_loader.py", line 259, in load_weights
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] loaded_weights = model.load_weights(
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] ^^^^^^^^^^^^^^^^^^^
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/model_executor/models/qwen.py", line 322, in load_weights
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] param = params_dict[name]
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] ~~~~~~~~~~~^^^^^^
(VllmWorker rank=1 pid=1112) ERROR 10-30 15:56:21 [multiproc_executor.py:511] KeyError: 'transformer.audio.audio_bos_eos_token.weight'
Loading safetensors checkpoint shards: 22% Completed | 2/9 [00:02<00:07, 1.03s/it]
(VllmWorker rank=0 pid=1111)
ERROR 10-30 15:56:23 [core.py:638] EngineCore failed to start.
ERROR 10-30 15:56:23 [core.py:638] Traceback (most recent call last):
ERROR 10-30 15:56:23 [core.py:638] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 629, in run_engine_core
ERROR 10-30 15:56:23 [core.py:638] engine_core = EngineCoreProc(*args, **kwargs)
ERROR 10-30 15:56:23 [core.py:638] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-30 15:56:23 [core.py:638] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 447, in init
ERROR 10-30 15:56:23 [core.py:638] super().init(vllm_config, executor_class, log_stats,
ERROR 10-30 15:56:23 [core.py:638] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/engine/core.py", line 77, in init
ERROR 10-30 15:56:23 [core.py:638] self.model_executor = executor_class(vllm_config)
ERROR 10-30 15:56:23 [core.py:638] ^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-30 15:56:23 [core.py:638] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/executor/executor_base.py", line 53, in init
ERROR 10-30 15:56:23 [core.py:638] self._init_executor()
ERROR 10-30 15:56:23 [core.py:638] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/executor/multiproc_executor.py", line 94, in _init_executor
ERROR 10-30 15:56:23 [core.py:638] self.workers = WorkerProc.wait_for_ready(unready_workers)
ERROR 10-30 15:56:23 [core.py:638] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ERROR 10-30 15:56:23 [core.py:638] File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250922.xpu-py3.12-linux-x86_64.egg/vllm/v1/executor/multiproc_executor.py", line 446, in wait_for_ready
ERROR 10-30 15:56:23 [core.py:638] raise e from None
ERROR 10-30 15:56:23 [core.py:638] Exception: WorkerProc initialization failed due to an exception in a background process. See stack trace for root cause.
Process EngineCore_0:
Traceback (most recent call last):
File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap