Update MSLK Triton FP8 row quantization kernel to match CUDA arithmetic and delete the C++ quantize_fp8_per_row kernel (#224) #854
| Job | Run time |
|---|---|
| 4s | |
| 6s | |
| 48m 49s | |
| 48m 54s | |
| 52m 45s | |
| 52m 43s | |
| 47m 47s | |
| 47m 47s | |
| 14s | |
| 15s | |
| 17s | |
| 12s | |
| 14s | |
| 16s | |
| 5h 0m 23s |
| Job | Run time |
|---|---|
| 4s | |
| 6s | |
| 48m 49s | |
| 48m 54s | |
| 52m 45s | |
| 52m 43s | |
| 47m 47s | |
| 47m 47s | |
| 14s | |
| 15s | |
| 17s | |
| 12s | |
| 14s | |
| 16s | |
| 5h 0m 23s |