Skip to content

RuntimeError: aclnnNotifyDispatch or aclnnNotifyDispatchGetWorkspaceSize not in libopapi.so, or libopapi.so not found. #192

@huangmengwei-cmss

Description

@huangmengwei-cmss

I compiled the deep ep on A2 and conducted tests, but the program reported a "method not found" error.

Command:

python tests/python/deepep/test_intranode.py --num-processes 8

ERROR Message:

Traceback (most recent call last):
  File "/usr/local/lib64/python3.11/site-packages/torch/multiprocessing/spawn.py", line 90, in _wrap
    fn(i, *args)
  File "/workspace/mnt/workspace/sgl-kernel-npu/tests/python/deepep/test_intranode.py", line 511, in test_loop
    test_main(args, num_local_ranks, local_rank, num_ranks, rank, buffer, group)
  File "/workspace/mnt/workspace/sgl-kernel-npu/tests/python/deepep/test_intranode.py", line 390, in test_main
    ) = buffer.dispatch(**dispatch_args)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/deep_ep/utils.py", line 87, in wrapper
    result = func(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib64/python3.11/site-packages/deep_ep/buffer.py", line 349, in dispatch
    ) = self.runtime.intranode_dispatch(
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: aclnnNotifyDispatch or aclnnNotifyDispatchGetWorkspaceSize not in libopapi.so, or libopapi.sonot found.
Exception raised from intranode_dispatch at /workspace/mnt/workspace/sgl-kernel-npu/csrc/deepep/deep_ep.cpp:256 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) + 0xb0 (0xffffa0da48c0 in /usr/local/lib64/python3.11/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) + 0x68 (0xffffa0d4c140 in /usr/local/lib64/python3.11/site-packages/torch/lib/libc10.so)

ENV:
A2 / CANN 8.3.RC1

Analysis:
It is estimated that the aclnnInner_notify_dispatch.h/cpp files are missing.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions