Skip to content

P800昆仑芯8卡模组环境,部署ERNIE-4.5-300B-A47B-FP8-Paddle模型报错 #7484

@b-birdy

Description

@b-birdy

P800昆仑芯8卡模组环境,paddle 3.3.1,fastdeploy 2.5.0,XPU驱动版本5.0.21.21
使用官方docker镜像部署报错:
**镜像:**ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.5.0
启动命令:
python -m fastdeploy.entrypoints.openai.api_server
--model /Work/Models/ernie/ERNIE-4.5-300B-A47B-FP8-Paddle
--port 28110
--metrics-port 28111
--engine-worker-queue-port 28112
--max-model-len 32768
--max-num-seqs 32
--tensor-parallel-size 4
--enable-output-caching
--enable-chunked-prefill

报错位置信息:
Loading Weights: 0%| | 0/100 [00:05<?, ?it/s]
ERROR 2026-04-19 19:22:38,720 4358 engine.py[line:160] Failed to launch worker processes, check log/workerlog.* for more details.
ERROR 2026-04-19 19:22:46,347 4358 engine.py[line:447] Error extracting sub services: [Errno 3] No such process, Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/engine.py", line 444, in _exit_sub_services
pgid = os.getpgid(self.worker_proc.pid)
ProcessLookupError: [Errno 3] No such process

worklog日志信息:
==> log/workerlog.0 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.1 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.2 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.3 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.0 <==
[2026-04-19 19:28:01,678] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,131] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.

==> log/workerlog.1 <==
[2026-04-19 19:28:01,711] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,065] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.
[2026-04-19 19:28:02,235] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,242] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,242] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'

==> log/workerlog.2 <==
[2026-04-19 19:28:01,716] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,175] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.

==> log/workerlog.3 <==
[2026-04-19 19:28:01,661] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,019] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.
[2026-04-19 19:28:02,193] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,199] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,199] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,199] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,200] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,200] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,200] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'

==> log/workerlog.0 <==
[2026-04-19 19:28:02,368] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,374] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,374] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,375] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,375] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,376] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,376] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'
[2026-04-19 19:28:02,938] [ INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='0', default_value='')
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)

==> log/workerlog.1 <==
[2026-04-19 19:28:02,773] [ INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='1', default_value='')
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)

==> log/workerlog.2 <==
[2026-04-19 19:28:02,407] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,413] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,413] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,414] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,414] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,414] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,415] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'
[2026-04-19 19:28:02,977] [ INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='2', default_value='')
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)

[2026-04-19 19:28:02,989] [ INFO] topology.py:526 - Total 4 pipe comm group(s) create successfully!
/usr/local/lib/python3.10/dist-packages/paddle/distributed/communication/group.py:145: UserWarning: Current global rank 2 is not in group _default_pg12
warnings.warn(
[2026-04-19 19:28:02,991] [ INFO] topology.py:526 - Total 4 data comm group(s) create successfully!
[2026-04-19 19:28:02,991] [ INFO] topology.py:526 - Total 1 model comm group(s) create successfully!
[2026-04-19 19:28:02,991] [ INFO] topology.py:526 - Total 4 sharding comm group(s) create successfully!
[2026-04-19 19:28:02,991] [ INFO] topology.py:440 - HybridParallelInfo: rank_id: 2, mp_degree: 4, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1, 2, 3], sharding_group: [2], pp_group: [2], dp_group: [2], sep:group: None, check/clip group: [0, 1, 2, 3]
[2026-04-19 19:28:02,992] [ INFO] - Using download source: huggingface
[2026-04-19 19:28:02,992] [ INFO] - Loading configuration file /Work/Models/ernie/ERNIE-4.5-300B-A47B-FP8-Paddle/config.json
[2026-04-19 19:28:02,993] [ WARNING] - You are using a model of type ernie4_5_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/dist-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
warnings.warn("Setuptools is replacing distutils.")
[2026-04-19 19:28:03,045] [ INFO] - Only support CUDA version flash attention.
/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/graph_optimization/utils.py:21: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
import pynvml

==> log/workerlog.3 <==
[2026-04-19 19:28:02,733] [ INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='3', default_value='')
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)

==> log/workerlog.2 <==
[2026-04-19 19:28:03,510] [ WARNING] - import noaux_tc Failed!
/usr/local/lib/python3.10/dist-packages/paddle/compat/proxy.py:415: UserWarning: Extending PyTorch compat scope, previous scope: {'triton'}, new scope: {'flashinfer'}.
warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/paddle/compat/proxy.py:415: UserWarning: Extending PyTorch compat scope, previous scope: {'flashinfer', 'triton'}, new scope: {'flashinfer'}.
warnings.warn(msg)
[2026-04-19 19:28:04,080] [ INFO] - GuidedDecoding max_num_seqs=64 fill_bitmask_parallel_batch_size=4 is_cuda_platform=False max_workers=16.0
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/../worker/worker_process.py", line 1296, in
run_worker_proc()
File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 406, in _decorate_function
return func(*args, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/../worker/worker_process.py", line 1279, in run_worker_proc
worker_proc.load_model()
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/../worker/worker_process.py", line 740, in load_model
self.worker.load_model()
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/worker/xpu_worker.py", line 153, in load_model
self.model_runner.load_model()
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/worker/xpu_model_runner.py", line 1201, in load_model
self.model = model_loader.load_model(fd_config=self.fd_config)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/model_loader/default_loader_v1.py", line 97, in load_model
model = model_cls(fd_config)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 532, in init
self.ernie = Ernie4_5_Model(fd_config=fd_config)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/graph_optimization/decorator.py", line 55, in init
origin_init(self, fd_config=fd_config, **kwargs)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 441, in init
[
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 442, in
Ernie4_5_DecoderLayer(
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 329, in init
self.self_attn = Ernie4_5_Attention(
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 278, in init
self.qkv_proj = QKVParallelLinear(
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/linear.py", line 667, in init
super().init(
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/linear.py", line 444, in init
super().init(
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/linear.py", line 175, in init
and fd_config.quant_config.get_quant_method(self)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/quantization/mix_quant.py", line 129, in get_quant_method
.from_config({"is_quantized": not self.is_checkpoint_bf16})
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/quantization/block_wise_fp8.py", line 79, in from_config
return cls(weight_block_size, is_checkpoint_bf16)
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/quantization/block_wise_fp8.py", line 70, in init
self.deepgemm_scale_ue8m0 = True if get_sm_version() >= 100 else False
File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/utils.py", line 556, in get_sm_version
prop = paddle.device.cuda.get_device_properties()
File "/usr/local/lib/python3.10/dist-packages/paddle/device/cuda/init.py", line 636, in get_device_properties
raise ValueError(
ValueError: The API paddle.device.cuda.get_device_properties is not supported in CPU-only PaddlePaddle. Please reinstall PaddlePaddle with GPU support to call this API.

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions