-
Notifications
You must be signed in to change notification settings - Fork 102
Description
Description
I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.
The problematic code section is as follows:
__device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a) { uint16_t val; #if __CUDA_ARCH__ >= 890 asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a)); #else assert(false); #endif return val; }
Question:
Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?