Skip to content

[BUG] attention/prefill.cuh(138): error: expression must have a constant value #806

Open
@haohaibo

Description

Hi, when I compile flashInfer C++ API from scratch. I met the following compile error. Can anyone help to have a look?
Thanks!

Environment:
CUDA version: cuda12.6
Host compiler: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
OS: Ubuntu 24.04
GPU: NVIDIA GeForce RTX 4050 Laptop

Error Message:

/data/work/flashinfer/include/flashinfer/attention/prefill.cuh(138): error: expression must have a constant value
        AttentionVariant::use_softmax ? DTypeQKAccum(-math::inf) : DTypeQKAccum(0.f);
        ^
/data/work/flashinfer/include/flashinfer/attention/prefill.cuh(138): note #2703-D: cannot call non-constexpr function "__half::__half(float)" (declared at line 4608 of /data/cuda-12.6/bin/../targets/x86_64-linux/include/cuda_fp16.h)
        AttentionVariant::use_softmax ? DTypeQKAccum(-math::inf) : DTypeQKAccum(0.f);
                                        ^
          detected during:
            instantiation of class "flashinfer::KernelTraits<MASK_MODE_, CTA_TILE_Q_, NUM_MMA_Q_, NUM_MMA_KV_, NUM_MMA_D_QK_, NUM_MMA_D_VO_, NUM_WARPS_Q_, NUM_WARPS_KV_, POS_ENCODING_MODE_, DTypeQ_, DTypeKV_, DTypeO_, DTypeQKAccum_, IdType_, AttentionVariant_> [with MASK_MODE_=flashinfer::MaskMode::kNone, CTA_TILE_Q_=16U, NUM_MMA_Q_=1U, NUM_MMA_KV_=1U, NUM_MMA_D_QK_=8U, NUM_MMA_D_VO_=8U, NUM_WARPS_Q_=1U, NUM_WARPS_KV_=4U, POS_ENCODING_MODE_=flashinfer::PosEncodingMode::kNone, DTypeQ_=half, DTypeKV_=__nv_fp8_e5m2, DTypeO_=half, DTypeQKAccum_=half, IdType_=int32_t, AttentionVariant_=flashinfer::AttentionVariant4]" at line 735

Reproduce:

/data/cuda-12.6/bin/nvcc -forward-unknown-to-host-compiler -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -I/data/work/flashinfer/include -O3 -DNDEBUG -arch=native -Xcompiler=-fPIC --fatbin-options -compress-all -MD -MT
CMakeFiles/prefill_kernels.dir/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu.o 
-MF CMakeFiles/prefill_kernels.dir/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu.o.d -x cu 
-c /data/work/flashinfer/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu 
-o CMakeFiles/prefill_kernels.dir/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu.o --std=c++17

Activity

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions