-
Notifications
You must be signed in to change notification settings - Fork 10
Description
系统及版本
linux25.04
multi-arc-bmg-offline-installer-25.38.4.1
docker指令
sudo docker run -td
--privileged
--net=host
--device=/dev/dri
--name=Qwen3VL-2B
-v "/root/.cache/modelscope/hub/models/Qwen/":/llm/models/
--shm-size="32g"
--entrypoint /bin/bash
intel/llm-scaler-vllm:1.1-preview
启动命令
export ZE_AFFINITY_MASK=0,1
export TORCH_LLM_ALLREDUCE=1
export VLLM_USE_V1=1
export CCL_ZE_IPC_EXCHANGE=pidfd
export VLLM_ALLOW_LONG_MAX_MODEL_LEN=1
export VLLM_WORKER_MULTIPROC_METHOD=spawn
python3 -m vllm.entrypoints.openai.api_server
--model /llm/models/Qwen3-VL-2B-Instruct
--served-model-name Qwen3-VL-2B-Instruct
--dtype=float16
--enforce-eager
--port 8000
--host 0.0.0.0
--trust-remote-code
--gpu-memory-util=0.9
--no-enable-prefix-caching
--max-num-batched-tokens=8192
--disable-log-requests
--max-model-len=8192
--block-size 64
-tp=2
报错
[W1024 03:43:06.029903221 OperatorEntry.cpp:218] Warning: Warning only once for all operators, other operators may also be overridden.
Overriding a previously registered kernel for the same operator and the same dispatch key
operator: aten::geometric_(Tensor(a!) self, float p, *, Generator? generator=None) -> Tensor(a!)
registered at /pytorch/build/aten/src/ATen/RegisterSchema.cpp:6
dispatch key: XPU
previous kernel: registered at /pytorch/aten/src/ATen/VmapModeRegistrations.cpp:37
new kernel: registered at /root/workspace/frameworks.ai.pytorch.ipex-gpu/build/Release/csrc/gpu/csrc/gpu/xpu/ATen/RegisterXPU_0.cpp:172 (function operat
INFO 10-24 03:43:07 [init.py:235] Automatically detected platform xpu.
INFO 10-24 03:43:08 [api_server.py:1755] vLLM API server version 0.10.1.dev0+g6d8d0a24c.d20250902
INFO 10-24 03:43:08 [cli_args.py:261] non-default args: {'host': '0.0.0.0', 'model': '/llm/models/Qwen3-VL-2B-Instruct', 'trust_remote_code': True, 'dtype': 'f8192, 'enforce_eager': True, 'served_model_name': ['Qwen3-VL-2B-Instruct'], 'tensor_parallel_size': 2, 'block_size': 64, 'enable_prefix_caching': False, 'max_n'disable_log_requests': True}
Traceback (most recent call last):
File "", line 198, in _run_module_as_main
File "", line 88, in _run_code
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line
uvloop.run(run_server(args))
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 109, in run
return __asyncio.run(
^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 194, in run
return runner.run(main)
^^^^^^^^^^^^^^^^
File "/usr/lib/python3.12/asyncio/runners.py", line 118, in run
return self._loop.run_until_complete(task)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
File "/usr/local/lib/python3.12/dist-packages/uvloop/init.py", line 61, in wrapper
return await main
^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line
await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line
async with build_async_engine_client(args, client_config) as engine_client:
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line _client
async with build_async_engine_client_from_engine_args(
File "/usr/lib/python3.12/contextlib.py", line 210, in aenter
return await anext(self.gen)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/entrypoints/openai/api_server.py", line _client_from_engine_args
vllm_config = engine_args.create_engine_config(usage_context=usage_context)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 1004, in crea
model_config = self.create_model_config()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/vllm-0.10.1.dev0+g6d8d0a24c.d20250902.xpu-py3.12-linux-x86_64.egg/vllm/engine/arg_utils.py", line 872, in creat
return ModelConfig(
^^^^^^^^^^^^
File "/usr/local/lib/python3.12/dist-packages/pydantic/_internal/_dataclasses.py", line 123, in init
s.pydantic_validator.validate_python(ArgsKwargs(args, kwargs), self_instance=s)
pydantic_core._pydantic_core.ValidationError: 1 validation error for ModelConfig
Value error, The checkpoint you are trying to load has model type qwen3_vl but Transformers does not recognize this architecture. This could be because of nt, or because your version of Transformers is out of date.
You can update Transformers with the command pip install --upgrade transformers. If this does not work, and the checkpoint is very new, then there may not bepports this model yet. In this case, you can get the most up-to-date code by installing Transformers from source with the command pip install git+https://githrmers.git [type=value_error, input_value=ArgsKwargs((), {'model': ...attention_dtype': None}), input_type=ArgsKwargs]
For further information visit https://errors.pydantic.dev/2.11/v/value_error