Open
Description
Hi,
I have been trying to use deepspeed on Power Linux platform in a virtual enviroment with CPU accelerator support.
When tried to enable the AuToTP, during the compilation of the deepspeed_shm_comm extension the following compilation error is observed
.
(ptenv26) $ deepspeed --num_accelerators=2 --bind_cores_to_rank --bind_core_list 0-40 driver --device=cpu --reps=1 --model=“~/granite-3b" --model_class=GPTBigCodeForCausalLM --input_size=32 --output_size=200 --batch_size=1
[2025-02-28 01:57:06,419] [WARNING] [real_accelerator.py:181:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2025-02-28 01:57:06,428] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2025-02-28 01:57:08,352] [WARNING] [runner.py:215:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2025-02-28 01:57:08,356] [INFO] [runner.py:607:main] cmd = /home/user/ptenv26/bin/python3.12 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None --bind_cores_to_rank --bind_core_list=0-40 driver-ds-fp32-v3 --device=cpu --reps=1 --model=/home/user/models/granite-3b --model_class=GPTBigCodeForCausalLM --input_size=32 --output_size=200 --batch_size=1
[2025-02-28 01:57:09,616] [WARNING] [real_accelerator.py:181:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2025-02-28 01:57:09,626] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2025-02-28 01:57:11,535] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2025-02-28 01:57:11,535] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2025-02-28 01:57:11,535] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2025-02-28 01:57:11,535] [INFO] [launch.py:164:main] dist_world_size=2
[2025-02-28 01:57:11,535] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2025-02-28 01:57:11,552] [INFO] [launch.py:256:main] process 2180496 spawned with command: ['numactl', '-m', '0', '-C', '0-19', '/home/user/ptenv26/bin/python3.12', '-u', 'driver-ds-fp32-v3', '--local_rank=0', '--device=cpu', '--reps=1', '--model=/home/user/models/granite-3b', '--model_class=GPTBigCodeForCausalLM', '--input_size=32', '--output_size=200', '--batch_size=1']
[2025-02-28 01:57:11,568] [INFO] [launch.py:256:main] process 2180499 spawned with command: ['numactl', '-m', '0', '-C', '20-39', '/home/user/ptenv26/bin/python3.12', '-u', 'driver-ds-fp32-v3', '--local_rank=1', '--device=cpu', '--reps=1', '--model=/home/user/models/granite-3b', '--model_class=GPTBigCodeForCausalLM', '--input_size=32', '--output_size=200', '--batch_size=1']
[2025-02-28 01:57:13,076] [WARNING] [real_accelerator.py:181:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2025-02-28 01:57:13,086] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cpu (auto detect)
[2025-02-28 01:57:13,087] [WARNING] [real_accelerator.py:181:get_accelerator] Setting accelerator to CPU. If you have GPU or other accelerator, we were unable to detect it.
[2025-02-28 01:57:13,097] [INFO] [real_accelerator.py:222:get_accelerator] Setting ds_accelerator to cpu (auto detect)
Loading checkpoint shards: 100%|
…
[2025-02-28 01:58:06,762] [INFO] [logging.py:128:log_dist] [Rank -1] DeepSpeed info: version=0.16.4, git-hash=unknown, git-branch=unknown
[2025-02-28 01:58:06,762] [INFO] [logging.py:128:log_dist] [Rank -1] DeepSpeed info: version=0.16.4, git-hash=unknown, git-branch=unknown
[2025-02-28 01:58:06,763] [INFO] [logging.py:128:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2025-02-28 01:58:06,763] [INFO] [logging.py:128:log_dist] [Rank -1] quantize_bits = 8 mlp_extra_grouping = False, quantize_groups = 1
[2025-02-28 01:58:06,766] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-02-28 01:58:06,766] [INFO] [comm.py:658:init_distributed] cdb=None
[2025-02-28 01:58:06,766] [INFO] [comm.py:689:init_distributed] Initializing TorchBackend in DeepSpeed with backend gloo
Using /home/user/.cache/torch_extensions/py312_cpu as PyTorch extensions root...Using /home/user/.cache/torch_extensions/py312_cpu as PyTorch extensions root...
Creating extension directory /home/user/.cache/torch_extensions/py312_cpu/deepspeed_shm_comm...
Creating extension directory /home/user/.cache/torch_extensions/py312_cpu/deepspeed_shm_comm...
Emitting ninja build file /home/user/.cache/torch_extensions/py312_cpu/deepspeed_shm_comm/build.ninja...
Building extension module deepspeed_shm_comm...
…
c++ -MMD -MF shm.o.d -DTORCH_EXTENSION_NAME=deepspeed_shm_comm -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1016\" -I/home/user/ptenv26/lib64/python3.12/site-packages/deepspeed/ops/csrc/cpu/includes -isystem /home/user/ptenv26/lib64/python3.12/site-packages/torch/include -isystem /home/user/ptenv26/lib64/python3.12/site-packages/torch/include/torch/csrc/api/include -isystem /usr/include/python3.12 -D_GLIBCXX_USE_CXX11_ABI=1 -fPIC -std=c++17 -O2 -fopenmp -c /home/user/ptenv26/lib64/python3.12/site-packages/deepspeed/ops/csrc/cpu/comm/shm.cpp -o shm.o
/home/user/ptenv26/lib64/python3.12/site-packages/deepspeed/ops/csrc/cpu/comm/shm.cpp:10:10: fatal error: immintrin.h: No such file or directory
10 | #include <immintrin.h>
| ^~~~~~~~~~~~~
compilation terminated.
The file shm.cpp has intel specific intrinsic code . I would like to extend this for Power CPU support.
Could anyone help with suggestions ?
Regards,
Vinitha Vijayan
Metadata
Metadata
Assignees
Labels
No labels
Activity