Checklist / 检查清单
Bug Description / Bug 描述
在ms-swfit 4.2.3官方镜像源安装包之后,执行训练程序提示tilelang未安装,通过手动安装程序之后黑夜为0.1.10 其他0.1.9也会有类似问题,安装成功再执行程序训练的时候会出现如下错误 [rank1]: import flashinfer.comm as _flashinfer_comm
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/init.py", line 1, in
[rank1]: from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in
[rank1]: cudart = CudaRTLibrary()
[rank1]: ^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in init
[rank1]: f = getattr(self.lib, func.name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/ctypes/init.py", line 392, in getattr
[rank1]: func = self.getitem(name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/ctypes/init.py", line 397, in getitem
[rank1]: func = self._FuncPtr((name_or_ordinal, self))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: /usr/local/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset 目前临时解决方案是使用的对应环境下的cuda下so文件来替换tilelang下的libcudart_stub.so文件。具体可参考文章https://zhuanlan.zhihu.com/p/2045914838971455114
How to Reproduce / 如何复现
通过官方镜像源安装4.2.3 swift版本,执行训练程序
Additional Information / 补充信息
No response
Checklist / 检查清单
Bug Description / Bug 描述
在ms-swfit 4.2.3官方镜像源安装包之后,执行训练程序提示tilelang未安装,通过手动安装程序之后黑夜为0.1.10 其他0.1.9也会有类似问题,安装成功再执行程序训练的时候会出现如下错误 [rank1]: import flashinfer.comm as _flashinfer_comm
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/init.py", line 1, in
[rank1]: from .cuda_ipc import CudaRTLibrary, create_shared_buffer, free_shared_buffer
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 194, in
[rank1]: cudart = CudaRTLibrary()
[rank1]: ^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/site-packages/flashinfer/comm/cuda_ipc.py", line 134, in init
[rank1]: f = getattr(self.lib, func.name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/ctypes/init.py", line 392, in getattr
[rank1]: func = self.getitem(name)
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^
[rank1]: File "/usr/local/lib/python3.12/ctypes/init.py", line 397, in getitem
[rank1]: func = self._FuncPtr((name_or_ordinal, self))
[rank1]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank1]: AttributeError: /usr/local/lib/python3.12/site-packages/tilelang/lib/libcudart_stub.so: undefined symbol: cudaDeviceReset 目前临时解决方案是使用的对应环境下的cuda下so文件来替换tilelang下的libcudart_stub.so文件。具体可参考文章https://zhuanlan.zhihu.com/p/2045914838971455114
How to Reproduce / 如何复现
通过官方镜像源安装4.2.3 swift版本,执行训练程序
Additional Information / 补充信息
No response