Skip to content

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

@divedb

Description

@divedb

Description

I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.

error

The problematic code section is as follows:
__device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a) { uint16_t val; #if __CUDA_ARCH__ >= 890 asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a)); #else assert(false); #endif return val; }

Question:

Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions