-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Open
Description
使用vllm部署ernie-4.5-21B-A3B gguf模型,报错
(APIServer pid=162062) Traceback (most recent call last):
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/bin/vllm", line 8, in <module>
(APIServer pid=162062) sys.exit(main())
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/cli/main.py", line 73, in main
(APIServer pid=162062) args.dispatch_function(args)
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/cli/serve.py", line 60, in cmd
(APIServer pid=162062) uvloop.run(run_server(args))
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/uvloop/__init__.py", line 82, in run
(APIServer pid=162062) return loop.run_until_complete(wrapper())
(APIServer pid=162062) File "uvloop/loop.pyx", line 1518, in uvloop.loop.Loop.run_until_complete
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/uvloop/__init__.py", line 61, in wrapper
(APIServer pid=162062) return await main
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 2024, in run_server
(APIServer pid=162062) await run_server_worker(listen_address, sock, args, **uvicorn_kwargs)
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 2043, in run_server_worker
(APIServer pid=162062) async with build_async_engine_client(
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=162062) return await anext(self.gen)
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 195, in build_async_engine_client
(APIServer pid=162062) async with build_async_engine_client_from_engine_args(
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/contextlib.py", line 199, in __aenter__
(APIServer pid=162062) return await anext(self.gen)
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/entrypoints/openai/api_server.py", line 221, in build_async_engine_client_from_engine_args
(APIServer pid=162062) vllm_config = engine_args.create_engine_config(usage_context=usage_context)
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/arg_utils.py", line 1351, in create_engine_config
(APIServer pid=162062) maybe_override_with_speculators(
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/vllm/transformers_utils/config.py", line 530, in maybe_override_with_speculators
(APIServer pid=162062) config_dict, _ = PretrainedConfig.get_config_dict(
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 662, in get_config_dict
(APIServer pid=162062) config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/transformers/configuration_utils.py", line 753, in _get_config_dict
(APIServer pid=162062) config_dict = load_gguf_checkpoint(resolved_config_file, return_tensors=False)["config"]
(APIServer pid=162062) File "/root/miniconda3/envs/vllm/lib/python3.10/site-packages/transformers/modeling_gguf_pytorch_utils.py", line 431, in load_gguf_checkpoint
(APIServer pid=162062) raise ValueError(f"GGUF model with architecture {architecture} is not supported yet.")
(APIServer pid=162062) ValueError: GGUF model with architecture ernie4_5-moe is not supported yet.
请问目前是用什么环境版本能再本地跑起来4位量化的模型呢,希望本地单卡部署ernie-4.5-21B,显存大概有20g、22g、24g,
以下是推理环境:
transformers==4.57.3
torch==2.9.0
vllm==0.11.2
cuda环境:
NVIDIA-SMI 570.86.10 Driver Version: 570.86.10 CUDA Version: 12.8
Metadata
Metadata
Assignees
Labels
No labels