P800昆仑芯8卡模组环境，部署ERNIE-4.5-300B-A47B-FP8-Paddle模型报错

**P800昆仑芯8卡模组环境，paddle 3.3.1，fastdeploy 2.5.0，XPU驱动版本5.0.21.21**
**使用官方docker镜像部署报错：**
**镜像：**ccr-2vdh3abv-pub.cnc.bj.baidubce.com/paddlepaddle/fastdeploy-xpu:2.5.0
**启动命令：**
python -m fastdeploy.entrypoints.openai.api_server \
       --model /Work/Models/ernie/ERNIE-4.5-300B-A47B-FP8-Paddle \
       --port 28110 \
       --metrics-port 28111 \
       --engine-worker-queue-port 28112 \
       --max-model-len 32768 \
       --max-num-seqs 32 \
       --tensor-parallel-size 4 \
       --enable-output-caching \
       --enable-chunked-prefill

**报错位置信息：**
Loading Weights:   0%|                                                                                                                                                                                                                                                                        | 0/100 [00:05<?, ?it/s]
ERROR    2026-04-19 19:22:38,720 4358  engine.py[line:160] Failed to launch worker processes, check log/workerlog.* for more details.
ERROR    2026-04-19 19:22:46,347 4358  engine.py[line:447] Error extracting sub services: [Errno 3] No such process, Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/engine.py", line 444, in _exit_sub_services
    pgid = os.getpgid(self.worker_proc.pid)
ProcessLookupError: [Errno 3] No such process

**worklog日志信息：**
==> log/workerlog.0 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.1 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.2 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.3 <==
XCCL /usr/local/lib/python3.10/dist-packages/paddle/base/../libs/libbkcl.so loaded
WARNING: OMP_NUM_THREADS set to 3, not 1. The computation speed will not be optimized if you use data parallel. It will fail if this PaddlePaddle binary is compiled with OpenBlas since OpenBlas does not support multi-threads.
PLEASE USE OMP_NUM_THREADS WISELY.

==> log/workerlog.0 <==
[2026-04-19 19:28:01,678] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,131] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.

==> log/workerlog.1 <==
[2026-04-19 19:28:01,711] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,065] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.
[2026-04-19 19:28:02,235] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,241] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,242] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,242] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'

==> log/workerlog.2 <==
[2026-04-19 19:28:01,716] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,175] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.

==> log/workerlog.3 <==
[2026-04-19 19:28:01,661] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
None of PyTorch, TensorFlow >= 2.0, or Flax have been found. Models won't be available and only tokenizers, configuration and file/data utilities can be used.
[2026-04-19 19:28:02,019] [ WARNING] - Due to potential compatibility issues between PaddlePaddle and PyTorch in PaddleFormers, PaddleFormers defaults transformers.utils.import_utils.is_torch_available and transformers.utils.import_utils.is_torchvision_available to False. If you need to use PyTorch in transformers or torchvision, please add del sys.modules['transformers'] before using them.
[2026-04-19 19:28:02,193] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,199] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,199] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,199] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,200] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,200] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,200] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'

==> log/workerlog.0 <==
[2026-04-19 19:28:02,368] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,374] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,374] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,375] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,375] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,376] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,376] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'
[2026-04-19 19:28:02,938] [    INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='0', default_value='')
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)
=======================================================================

==> log/workerlog.1 <==
[2026-04-19 19:28:02,773] [    INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='1', default_value='')
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
=======================================================================

==> log/workerlog.2 <==
[2026-04-19 19:28:02,407] [ WARNING] prometheus_multiprocess_setup.py:41 - Found PROMETHEUS_MULTIPROC_DIR:/tmp/prom_main_261f07c9-0600-4385-9fd8-0bf06cb0c50c was set by user. you will find inaccurate metrics. Unset the variable will properly handle cleanup.
[2026-04-19 19:28:02,413] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.cpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.cpu.fastdeploy_cpu_ops'
[2026-04-19 19:28:02,413] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gcu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gcu.fastdeploy_ops'
decide_module error, load custom_ops from .fastdeploy_ops: name '_get_device_properties' is not defined
[2026-04-19 19:28:02,414] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.gpu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.gpu.fastdeploy_ops'
[2026-04-19 19:28:02,414] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.iluvatar import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.iluvatar.fastdeploy_ops'
[2026-04-19 19:28:02,414] [ WARNING] import_ops.py:46 - Ops of paddle_custom_device import failed, it may be not compiled. No module named 'paddle_custom_device'
[2026-04-19 19:28:02,415] [ WARNING] import_ops.py:46 - Ops of fastdeploy.model_executor.ops.npu import failed, it may be not compiled. No module named 'fastdeploy.model_executor.ops.npu.fastdeploy_ops'
[2026-04-19 19:28:02,977] [    INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='2', default_value='')
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
=======================================================================
[2026-04-19 19:28:02,989] [    INFO] topology.py:526 - Total 4 pipe comm group(s) create successfully!
/usr/local/lib/python3.10/dist-packages/paddle/distributed/communication/group.py:145: UserWarning: Current global rank 2 is not in group _default_pg12
  warnings.warn(
[2026-04-19 19:28:02,991] [    INFO] topology.py:526 - Total 4 data comm group(s) create successfully!
[2026-04-19 19:28:02,991] [    INFO] topology.py:526 - Total 1 model comm group(s) create successfully!
[2026-04-19 19:28:02,991] [    INFO] topology.py:526 - Total 4 sharding comm group(s) create successfully!
[2026-04-19 19:28:02,991] [    INFO] topology.py:440 - HybridParallelInfo: rank_id: 2, mp_degree: 4, sharding_degree: 1, pp_degree: 1, dp_degree: 1, sep_degree: 1, mp_group: [0, 1, 2, 3],  sharding_group: [2], pp_group: [2], dp_group: [2], sep:group: None, check/clip group: [0, 1, 2, 3]
[2026-04-19 19:28:02,992] [    INFO] - Using download source: huggingface
[2026-04-19 19:28:02,992] [    INFO] - Loading configuration file /Work/Models/ernie/ERNIE-4.5-300B-A47B-FP8-Paddle/config.json
[2026-04-19 19:28:02,993] [ WARNING] - You are using a model of type ernie4_5_moe to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
/usr/local/lib/python3.10/dist-packages/_distutils_hack/init.py:33: UserWarning: Setuptools is replacing distutils.
  warnings.warn("Setuptools is replacing distutils.")
[2026-04-19 19:28:03,045] [    INFO] - Only support CUDA version flash attention.
/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/graph_optimization/utils.py:21: FutureWarning: The pynvml package is deprecated. Please install nvidia-ml-py instead. If you did not install pynvml directly, please report this to the maintainers of the package that installed pynvml for you.
  import pynvml

==> log/workerlog.3 <==
[2026-04-19 19:28:02,733] [    INFO] distributed_strategy.py:341 - distributed strategy initialized
======================= Modified FLAGS detected =======================
FLAGS(name='FLAGS_enable_pir_in_executor', current_value=True, default_value=False)
FLAGS(name='FLAGS_specialize_device_in_dy2st', current_value=True, default_value=False)
FLAGS(name='FLAGS_selected_xpus', current_value='3', default_value='')
FLAGS(name='FLAGS_pir_interpreter_record_stream_for_gc_cache', current_value=True, default_value=False)
FLAGS(name='FLAGS_parameters_persistent_mode_in_dy2st', current_value=True, default_value=False)
=======================================================================

==> log/workerlog.2 <==
[2026-04-19 19:28:03,510] [ WARNING] - import noaux_tc Failed!
/usr/local/lib/python3.10/dist-packages/paddle/compat/proxy.py:415: UserWarning: Extending PyTorch compat scope, previous scope: {'triton'}, new scope: {'flashinfer'}.
  warnings.warn(msg)
/usr/local/lib/python3.10/dist-packages/paddle/compat/proxy.py:415: UserWarning: Extending PyTorch compat scope, previous scope: {'flashinfer', 'triton'}, new scope: {'flashinfer'}.
  warnings.warn(msg)
[2026-04-19 19:28:04,080] [    INFO] - GuidedDecoding max_num_seqs=64 fill_bitmask_parallel_batch_size=4 is_cuda_platform=False max_workers=16.0
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/../worker/worker_process.py", line 1296, in <module>
    run_worker_proc()
  File "/usr/local/lib/python3.10/dist-packages/paddle/base/dygraph/base.py", line 406, in _decorate_function
    return func(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/../worker/worker_process.py", line 1279, in run_worker_proc
    worker_proc.load_model()
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/engine/../worker/worker_process.py", line 740, in load_model
    self.worker.load_model()
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/worker/xpu_worker.py", line 153, in load_model
    self.model_runner.load_model()
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/worker/xpu_model_runner.py", line 1201, in load_model
    self.model = model_loader.load_model(fd_config=self.fd_config)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/model_loader/default_loader_v1.py", line 97, in load_model
    model = model_cls(fd_config)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 532, in init
    self.ernie = Ernie4_5_Model(fd_config=fd_config)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/graph_optimization/decorator.py", line 55, in init
    origin_init(self, fd_config=fd_config, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 441, in init
    [
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 442, in <listcomp>
    Ernie4_5_DecoderLayer(
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 329, in init
    self.self_attn = Ernie4_5_Attention(
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/models/ernie4_5_moe.py", line 278, in init
    self.qkv_proj = QKVParallelLinear(
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/linear.py", line 667, in init
    super().init(
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/linear.py", line 444, in init
    super().init(
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/linear.py", line 175, in init
    and fd_config.quant_config.get_quant_method(self)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/quantization/mix_quant.py", line 129, in get_quant_method
    .from_config({"is_quantized": not self.is_checkpoint_bf16})
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/quantization/block_wise_fp8.py", line 79, in from_config
    return cls(weight_block_size, is_checkpoint_bf16)
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/quantization/block_wise_fp8.py", line 70, in init
    self.deepgemm_scale_ue8m0 = True if get_sm_version() >= 100 else False
  File "/usr/local/lib/python3.10/dist-packages/fastdeploy/model_executor/layers/utils.py", line 556, in get_sm_version
    prop = paddle.device.cuda.get_device_properties()
  File "/usr/local/lib/python3.10/dist-packages/paddle/device/cuda/init.py", line 636, in get_device_properties
    raise ValueError(
ValueError: The API paddle.device.cuda.get_device_properties is not supported in CPU-only PaddlePaddle. Please reinstall PaddlePaddle with GPU support to call this API.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

P800昆仑芯8卡模组环境，部署ERNIE-4.5-300B-A47B-FP8-Paddle模型报错 #7484

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

P800昆仑芯8卡模组环境，部署ERNIE-4.5-300B-A47B-FP8-Paddle模型报错 #7484

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions