Add dummy dtypes #3195

EricLBuehler · 2025-11-17T19:02:25Z

Adds support for:

i32
i16
f6e2m3
f6e3m2
f4
f8e8m0
These are "dummy" dtypes: this just means a typed bitbucket essentially.
CPU compiles
CUDA compiles
Metal compiles

zackangelo · 2025-11-18T16:27:09Z

signed dtypes are nice 👌 I've been having to pass u32s as i32s in cuda launch code and have been worried that would blow up in my face at some point

ivarflakstad

This is gonna be a good one! 🙌

candle-core/src/cuda_backend/device.rs

candle-core/src/cuda_backend/mod.rs

candle-core/src/metal_backend/mod.rs

candle-core/src/lib.rs

ivarflakstad · 2025-11-19T09:53:53Z

candle-core/src/safetensors.rs

            DType::F64 => convert_slice::<f64>(data, shape, device),
-            DType::F8E4M3 => convert_slice::<F8E4M3>(data, shape, device),
+            DType::F8E4M3 => convert_slice::<float8::F8E4M3>(data, shape, device),
+            DType::F6E2M3 | DType::F6E3M2 | DType::F4 | DType::F8E8M0 => {


Doesn't have to be in this PR, but I'd prefer to hoist this out into a helper fn.
Perhaps use convert_slice::<u8>(data, shape, device) and manually change the storage dtype? Might not even need a dedicated fn now that I think about it 🤔

ivarflakstad · 2025-11-19T09:54:42Z

candle-core/src/safetensors.rs

+    let shape = view.shape();
+
+    // Create storage with the appropriate dummy type variant
+    let storage = match device {


Déjà vu helper fn 👀

candle-core/src/safetensors.rs

ivarflakstad · 2025-11-19T09:56:01Z

candle-core/src/safetensors.rs

-    #[test]
-    fn load_i8() {
-        let bytes = b"8\0\0\0\0\0\0\0{\"x\":{\"dtype\":\"I8\",\"shape\":[2],\"data_offsets\":[0,2]}}   \x01\x03";
-        std::fs::write("test_i8.safetensors", bytes).unwrap();


Not related to this PR, just noting down while I'm here: we should use temp files for these kinds of tests.

Absolutely 👍

candle-core/src/sort.rs

EricLBuehler · 2025-11-20T00:58:19Z

Addressed most of the review comments; left some as unresolved for posterity.

candle-core/src/cpu_backend/mod.rs

ivarflakstad · 2025-11-20T09:33:30Z

candle-core/src/cpu_backend/mod.rs

+                let data = unary_map(storage, layout, |v| v as f64);
+                Ok(Self::F64(data))
+            }
+            (Self::I32(storage), DType::F8E4M3) => {


I have an idea for how to reduce the massive size of this match.
Adding it to the ever growing list of things to do :)

candle-core/src/cpu_backend/mod.rs

ivarflakstad · 2025-11-20T09:41:51Z

candle-core/src/cuda_backend/utils.rs

            S::F16(s) => self.f(s, d, l, S::F16)?,
            S::F32(s) => self.f(s, d, l, S::F32)?,
            S::F64(s) => self.f(s, d, l, S::F64)?,
-            S::F8E4M3(s) => self.f(s, d, l, S::F8E4M3)?,


You resolved this but looks the same to me?

ivarflakstad · 2025-11-20T09:42:04Z

candle-core/src/cuda_backend/utils.rs

            (S::F16(s1), S::F16(s2)) => self.f(s1, l1, s2, l2, d)?,
            (S::F32(s1), S::F32(s2)) => self.f(s1, l1, s2, l2, d)?,
            (S::F64(s1), S::F64(s2)) => self.f(s1, l1, s2, l2, d)?,
-            (S::F8E4M3(s1), S::F8E4M3(s2)) => self.f(s1, l1, s2, l2, d)?,


ivarflakstad · 2025-11-20T09:47:10Z

candle-core/src/op.rs

-    #[inline(always)]
    fn f32(v: f32) -> f32 {
-        (crate::cpu::erf::erf_f32(v * std::f32::consts::FRAC_1_SQRT_2) + 1.) * 0.5 * v
+        Self::f64(v as f64) as f32


Still revert! ;)

candle-core/src/op.rs

candle-core/src/sort.rs

ivarflakstad · 2025-11-21T12:17:59Z

candle-core/src/cpu_backend/mod.rs

    }

+    fn get_current_seed(&self) -> Result<u64> {
+        crate::bail!("cannot get the CPU rng seed with get_current_seed")


I'll have a look into this later

ivarflakstad · 2025-11-21T12:18:48Z

candle-core/src/cuda_backend/utils.rs

-            S::F8E4M3(s) => S::F8E4M3(self.f(s, d, l)?),
+            S::F8E4M3(s) => self.f(s, d, l, S::F8E4M3)?,


Looks slightly off to me

ivarflakstad · 2025-11-21T12:19:23Z

candle-core/src/cuda_backend/utils.rs

-            (S::F8E4M3(s1), S::F8E4M3(s2)) => S::F8E4M3(self.f(s1, l1, s2, l2, d)?),
+            (S::F8E4M3(s1), S::F8E4M3(s2)) => self.f(s1, l1, s2, l2, d)?,


Not wrapping in storage?

ivarflakstad · 2025-11-21T12:19:58Z

candle-core/src/cuda_backend/utils.rs

-            (S::F8E4M3(s1), S::F8E4M3(s2), S::F8E4M3(s3)) => {
-                S::F8E4M3(self.f(s1, l1, s2, l2, s3, l3, d)?)
-            }


Is this not supported?

ivarflakstad · 2025-11-21T12:20:53Z

candle-core/src/cuda_backend/utils.rs

-            (S::F8E4M3(dst), S::F8E4M3(src)) => self.f(dst, dst_l, src, src_l, d),
+            (S::F8E4M3(_), S::F8E4M3(_)) => Err(CudaError::InternalError(
+                "Map2InPlace not supported for F8E4M3",
+            ))?,


Is this correct?

ivarflakstad · 2025-11-21T12:21:55Z

candle-core/src/cuda_backend/utils.rs

            S::F16(s) => self.f(s, d, l, S::F16)?,
            S::F32(s) => self.f(s, d, l, S::F32)?,
            S::F64(s) => self.f(s, d, l, S::F64)?,
-            S::F8E4M3(s) => self.f(s, d, l, S::F8E4M3)?,


You resolved this but looks the same to me?

ivarflakstad · 2025-11-21T12:22:04Z

candle-core/src/cuda_backend/utils.rs

            (S::F16(s1), S::F16(s2)) => self.f(s1, l1, s2, l2, d)?,
            (S::F32(s1), S::F32(s2)) => self.f(s1, l1, s2, l2, d)?,
            (S::F64(s1), S::F64(s2)) => self.f(s1, l1, s2, l2, d)?,
-            (S::F8E4M3(s1), S::F8E4M3(s2)) => self.f(s1, l1, s2, l2, d)?,


ivarflakstad · 2025-11-21T12:24:59Z

candle-core/src/sort.rs

-                    DType::F8E4M3 => crate::bail!("Metal device does not yet support F8E4M3."),
+                    DType::F8E4M3 => "asort_desc_f8e4m3",


Should use the same logic as above, no?

EricLBuehler added 2 commits November 17, 2025 14:00

Add dummy i32/i16/f6e2m3/f6e3m2/f4/f8e8m0 dtypes

35614f4

Metal fixes

8dd85fe

EricLBuehler marked this pull request as ready for review November 17, 2025 19:07

ivarflakstad and others added 2 commits November 17, 2025 23:13

Merge branch 'main' into dummy_dtypes

94d38d4

Fix candle-onnx build

f389c1f

EricLBuehler requested a review from ivarflakstad November 18, 2025 11:51

ivarflakstad reviewed Nov 19, 2025

View reviewed changes

Apply review comments

ccaa447

EricLBuehler requested a review from ivarflakstad November 20, 2025 00:57

Residual fixes

68d17ab

ivarflakstad reviewed Nov 20, 2025

View reviewed changes

Apply review comments

9f16dbe

EricLBuehler requested a review from ivarflakstad November 21, 2025 11:14

ivarflakstad reviewed Nov 21, 2025

View reviewed changes

		S::F8E4M3(s) => S::F8E4M3(self.f(s, d, l)?),
		S::F8E4M3(s) => self.f(s, d, l, S::F8E4M3)?,

		(S::F8E4M3(s1), S::F8E4M3(s2)) => S::F8E4M3(self.f(s1, l1, s2, l2, d)?),
		(S::F8E4M3(s1), S::F8E4M3(s2)) => self.f(s1, l1, s2, l2, d)?,

		DType::F8E4M3 => crate::bail!("Metal device does not yet support F8E4M3."),
		DType::F8E4M3 => "asort_desc_f8e4m3",

Add dummy dtypes #3195

Are you sure you want to change the base?

Add dummy dtypes #3195

Uh oh!

Conversation

EricLBuehler commented Nov 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zackangelo commented Nov 18, 2025

Uh oh!

ivarflakstad left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

EricLBuehler commented Nov 20, 2025

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

EricLBuehler commented Nov 17, 2025 •

edited

Loading