Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher

### Description
I encountered an error while trying to compile from the source code, specifically in the file src/nn/quant/fp8/fp8_util.cu. The code attempts to use a CUDA assembly instruction (e4m3x2) that requires a minimum GPU architecture of sm_90, but my current GPU does not support this architecture.

![error](https://github.com/user-attachments/assets/f6174ce3-9ad3-43f9-a53e-0e0bb1358dca)

The problematic code section is as follows:
`__device__ __forceinline__ uint16_t half2_to_e4m3(const uint32_t a)
{
    uint16_t val;
#if __CUDA_ARCH__ >= 890
    asm volatile("{ cvt.rn.satfinite.e4m3x2.f16x2 %0, %1;}\n" : "=h"(val) : "r"(a));
#else
    assert(false);
#endif
    return val;
}`

### Question:
Is there any way to modify the code to support GPUs with a lower compute capability (e.g., sm_80 or sm_70)? Specifically, is there an alternative approach for quantization that doesn't rely on e4m3x2 or a way to skip this operation for unsupported GPUs?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

Description

Question:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Feature 'cvt with .e4m3x2/.e5m2x2' requires .target sm_90 or higher #12

Description

Description

Question:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions