Skip to content

ms-swift 4.2.3镜像中缺少tilelang安装包,手动安装出现libcudart_stub.so: undefined symbol: cudaDeviceReset #9494

@yszhli

Description

@yszhli

Checklist / 检查清单

  • I have searched existing issues, and this is a new bug report. / 我已经搜索过现有的 issues,确认这是一个新的 bug report。

Bug Description / Bug 描述

在ms-swfit 4.2.3官方镜像源安装包之后,执行训练程序提示tilelang未安装,通过手动安装程序之后黑夜为0.1.10 其他0.1.9也会有类似问题,安装成功再执行程序训练的时候会出现如下错误 [rank1]: import flashinfer.comm as _flashinfer_comm
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/init.py", line 1, in
[rank1]: from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in
[rank1]: cudart = CudaRTLibrary()
[rank1]: ^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in init
[rank1]: f = getattr(self.lib, func.name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/ctypes/init.py", line 392, in getattr
[rank1]: func = self.getitem(name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/ctypes/init.py", line 397, in getitem
[rank1]: func = self._FuncPtr((name_or_ordinal, self))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: /usr/local/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset 目前临时解决方案是使用的对应环境下的cuda下so文件来替换tilelang下的libcudart_stub.so文件。具体可参考文章https://zhuanlan.zhihu.com/p/2045914838971455114

How to Reproduce / 如何复现

通过官方镜像源安装4.2.3 swift版本,执行训练程序

Additional Information / 补充信息

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions