-
Notifications
You must be signed in to change notification settings - Fork 32
Open
Description
Hi,
Thanks for providing this implementation. When we were trying to install this on A800 GPUs, we encountered this error:
[61/61] /home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc
FAILED: /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o
/home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc
sh: line 1: 29645 Killed ptxas -arch=sm_90 -m64 --generate-line-info "/tmp/tmpxft_00005048_00000000-6_flash_fwd_split_hdim64_fp16_sm80.compute_90.ptx" -o "/tmp/tmpxft_00005048_00000000-11_flash_fwd_split_hdim64_fp16_sm80.compute_90.cubin" > /tmp/tmpxft_00005048_00000000-13_2d74fb0_stdout 2> /tmp/tmpxft_00005048_00000000-13_2d74fb0_stderr
ninja: build stopped: subcommand failed.
The compilation stuck at [61/61] for a very long time, before it is kill by the os. What could be the potential problem?
Thanks.
mhbtz1, SimonSongg and nooooonee
Metadata
Metadata
Assignees
Labels
No labels