-
Notifications
You must be signed in to change notification settings - Fork 53
Open
Description
I compiled the deep ep on A2 and conducted tests, but the program reported a "method not found" error.
Command:
python tests/python/deepep/test_intranode.py --num-processes 8ERROR Message:
Traceback (most recent call last):
File "/usr/local/lib64/python3.11/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap
fn(i, *args)
File "/workspace/mnt/workspace/sgl-kernel-npu/tests/python/deepep/test_intranode.py", line 511, in test_loop
test_main(args, num_local_ranks, local_rank, num_ranks, rank, buffer, group)
File "/workspace/mnt/workspace/sgl-kernel-npu/tests/python/deepep/test_intranode.py", line 390, in test_main
) = buffer.dispatch(**dispatch_args)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib64/python3.11/site-packages/deep_ep/utils.py", line 87, in wrapper
result = func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib64/python3.11/site-packages/deep_ep/buffer.py", line 349, in dispatch
) = self.runtime.intranode_dispatch(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: aclnnNotifyDispatch or aclnnNotifyDispatchGetWorkspaceSize not in libopapi.so, or libopapi.sonot found.
Exception raised from intranode_dispatch at /workspace/mnt/workspace/sgl-kernel-npu/csrc/deepep/deep_ep.cpp:256 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0xffffa0da48c0 in /usr/local/lib64/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x68 (0xffffa0d4c140 in /usr/local/lib64/python3.11/site-packages/torch/lib/libc10.so)ENV:
A2 / CANN 8.3.RC1
Analysis:
It is estimated that the aclnnInner_notify_dispatch.h/cpp files are missing.
Metadata
Metadata
Assignees
Labels
No labels