Open
Description
Hi, when I compile flashInfer C++ API from scratch. I met the following compile error. Can anyone help to have a look?
Thanks!
Environment:
CUDA version: cuda12.6
Host compiler: g++ (Ubuntu 13.3.0-6ubuntu2~24.04) 13.3.0
OS: Ubuntu 24.04
GPU: NVIDIA GeForce RTX 4050 Laptop
Error Message:
/data/work/flashinfer/include/flashinfer/attention/prefill.cuh(138): error: expression must have a constant value
AttentionVariant::use_softmax ? DTypeQKAccum(-math::inf) : DTypeQKAccum(0.f);
^
/data/work/flashinfer/include/flashinfer/attention/prefill.cuh(138): note #2703-D: cannot call non-constexpr function "__half::__half(float)" (declared at line 4608 of /data/cuda-12.6/bin/../targets/x86_64-linux/include/cuda_fp16.h)
AttentionVariant::use_softmax ? DTypeQKAccum(-math::inf) : DTypeQKAccum(0.f);
^
detected during:
instantiation of class "flashinfer::KernelTraits<MASK_MODE_, CTA_TILE_Q_, NUM_MMA_Q_, NUM_MMA_KV_, NUM_MMA_D_QK_, NUM_MMA_D_VO_, NUM_WARPS_Q_, NUM_WARPS_KV_, POS_ENCODING_MODE_, DTypeQ_, DTypeKV_, DTypeO_, DTypeQKAccum_, IdType_, AttentionVariant_> [with MASK_MODE_=flashinfer::MaskMode::kNone, CTA_TILE_Q_=16U, NUM_MMA_Q_=1U, NUM_MMA_KV_=1U, NUM_MMA_D_QK_=8U, NUM_MMA_D_VO_=8U, NUM_WARPS_Q_=1U, NUM_WARPS_KV_=4U, POS_ENCODING_MODE_=flashinfer::PosEncodingMode::kNone, DTypeQ_=half, DTypeKV_=__nv_fp8_e5m2, DTypeO_=half, DTypeQKAccum_=half, IdType_=int32_t, AttentionVariant_=flashinfer::AttentionVariant4]" at line 735
Reproduce:
/data/cuda-12.6/bin/nvcc -forward-unknown-to-host-compiler -DFLASHINFER_ENABLE_BF16 -DFLASHINFER_ENABLE_FP8_E4M3 -DFLASHINFER_ENABLE_FP8_E5M2 -I/data/work/flashinfer/include -O3 -DNDEBUG -arch=native -Xcompiler=-fPIC --fatbin-options -compress-all -MD -MT
CMakeFiles/prefill_kernels.dir/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu.o
-MF CMakeFiles/prefill_kernels.dir/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu.o.d -x cu
-c /data/work/flashinfer/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu
-o CMakeFiles/prefill_kernels.dir/src/generated/batch_paged_prefill_head_qk_128_head_vo_128_posenc_0_fp16qkred_1_mask_0_dtypeq_f16_dtypekv_e5m2_dtypeout_f16_idtype_i32.cu.o --std=c++17
Metadata
Assignees
Labels
No labels
Activity