Skip to content

Compile Issue #5

@huangyuxiang03

Description

@huangyuxiang03

Hi,
Thanks for providing this implementation. When we were trying to install this on A800 GPUs, we encountered this error:

[61/61] /home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc
FAILED: /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o 
/home/test/test01/cuda-12.1/bin/nvcc --generate-dependencies-with-compile --dependency-output /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o.d -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src -I/home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/cutlass/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/torch/csrc/api/include -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/TH -I/home/test/test01/anaconda3/envs/duo/lib/python3.10/site-packages/torch/include/THC -I/home/test/test01/cuda-12.1/include -I/home/test/test01/anaconda3/envs/duo/include/python3.10 -c -c /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.cu -o /home/test/test01/hyx/duo-attention/Block-Sparse-Attention/build/temp.linux-x86_64-cpython-310/csrc/block_sparse_attn/src/flash_fwd_split_hdim64_fp16_sm80.o -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr --compiler-options ''"'"'-fPIC'"'"'' -O3 -std=c++17 -U__CUDA_NO_HALF_OPERATORS__ -U__CUDA_NO_HALF_CONVERSIONS__ -U__CUDA_NO_HALF2_OPERATORS__ -U__CUDA_NO_BFLOAT16_CONVERSIONS__ --expt-relaxed-constexpr --expt-extended-lambda --use_fast_math -lineinfo -gencode arch=compute_80,code=sm_80 -gencode arch=compute_90,code=sm_90 --threads 4 -DTORCH_API_INCLUDE_EXTENSION_H '-DPYBIND11_COMPILER_TYPE="_gcc"' '-DPYBIND11_STDLIB="_libstdcpp"' '-DPYBIND11_BUILD_ABI="_cxxabi1011"' -DTORCH_EXTENSION_NAME=block_sparse_attn_cuda -D_GLIBCXX_USE_CXX11_ABI=0 -ccbin /home/test/test01/anaconda3/envs/duo/bin/x86_64-conda-linux-gnu-cc
sh: line 1: 29645 Killed                  ptxas -arch=sm_90 -m64 --generate-line-info "/tmp/tmpxft_00005048_00000000-6_flash_fwd_split_hdim64_fp16_sm80.compute_90.ptx" -o "/tmp/tmpxft_00005048_00000000-11_flash_fwd_split_hdim64_fp16_sm80.compute_90.cubin" > /tmp/tmpxft_00005048_00000000-13_2d74fb0_stdout 2> /tmp/tmpxft_00005048_00000000-13_2d74fb0_stderr
ninja: build stopped: subcommand failed.

The compilation stuck at [61/61] for a very long time, before it is kill by the os. What could be the potential problem?
Thanks.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions