feat: Add FP8 dtype support (E4M3FN and E5M2) #15

nyo16 · 2026-01-08T17:09:57Z

Summary

Add FP8 (8-bit floating point) dtype support for E4M3FN and E5M2 formats. This enables reading and writing FP8 quantized model
weights from HuggingFace models like Qwen3-FP8.

Changes

Add FP8 dtype constants (F8_E4M3FN, F8_E5M2) and type mapping
Handle 3-tuple FP8 types {:f, 8, :e4m3fn} and {:f, 8, :e5m2} in:
- dtype_from_string/1 - Parse "F8_E4M3" from safetensor headers
- tensor_byte_size/1 - Calculate byte size for FP8 tensors
- tensor_to_iodata/1 - Serialize FP8 tensors
- build_tensor/2 - Deserialize FP8 tensors
Support reading FP8 model files (e.g., Qwen/Qwen3-0.6B-FP8)

Test plan

Unit tests for FP8 type encoding/decoding
Integration test reading real FP8 model files
Verified with Qwen3-0.6B-FP8 model inference

Notes

This is the first PR in a series to enable native FP8 model inference:

safetensors (this PR) - FP8 file I/O
nx/exla - FP8 type system support
bumblebee - FP8 model loading and inference

Adds support for F8_E4M3 and F8_E5M2 dtypes in SafeTensors format, enabling loading of fp8-quantized models from HuggingFace. Changes: - Add {:f, 8, :e4m3fn} → "F8_E4M3" mapping - Add {:f, 8, :e5m2} → "F8_E5M2" mapping - Add {:f, 8} → "F8_E5M2" for backward compatibility - Update dtype_to_type reverse mappings for fp8 formats Enables loading models like Qwen3-4B-Instruct-2507-FP8 which uses F8_E4M3 format for weights with fine-grained quantization.

- Test write/read for E4M3FN and E5M2 tensors - Test type preservation in round-trip - Test lazy loading with fp8 types - Test byte size calculation - Test dtype strings in SafeTensors header - Add NX_PATH environment variable support for local development

josevalim · 2026-01-08T17:27:45Z

Please remove the convo.txt :)

My suggestion is to break this in two. The first one is to add FP8 support, which means E5M2. No need for additional tuples and steps.

then a separate PR adds handling for unknown types. For now, the user should pass a separate function that receives the type and the value and builds the tensors

which types QWEN uses?

Co-Authored-By: Claude Opus 4.5 <[email protected]>

nyo16 · 2026-01-08T17:43:00Z

Qwen3 is using: F8_E4M3

nyo16 · 2026-01-08T17:44:54Z

ok I will work to break this down, for bumblebee and Nx i will open the PRs as draft for open more the discussion.

josevalim · 2026-01-08T19:37:49Z

@nyo16 to make everyone on the same page: elixir-nx/nx#1657 (comment)

I think this PR will be straight-forward once we add e4m3fn to Nx, no need for custom functions :)

nyo16 added 5 commits January 6, 2026 00:13

fix: Handle 3-tuple fp8 types in tensor_byte_size and tensor_to_iodata

72656d6

fix: Handle 3-tuple fp8 types in build_tensor

fbcebbd

docs: Add conversation export for GPU testing session

6af9cfc

chore: Remove temporary conversation file

61d5cac

Co-Authored-By: Claude Opus 4.5 <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add FP8 dtype support (E4M3FN and E5M2) #15

feat: Add FP8 dtype support (E4M3FN and E5M2) #15

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

josevalim commented Jan 8, 2026

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

josevalim commented Jan 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Add FP8 dtype support (E4M3FN and E5M2) #15

Are you sure you want to change the base?

feat: Add FP8 dtype support (E4M3FN and E5M2) #15

Uh oh!

Conversation

nyo16 commented Jan 8, 2026

Summary

Changes

Test plan

Notes

Uh oh!

josevalim commented Jan 8, 2026

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

nyo16 commented Jan 8, 2026

Uh oh!

josevalim commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

josevalim commented Jan 8, 2026 •

edited

Loading