feat: Add FP8 dtype support (E4M3FN and E5M2) #15
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Add FP8 (8-bit floating point) dtype support for E4M3FN and E5M2 formats. This enables reading and writing FP8 quantized model
weights from HuggingFace models like Qwen3-FP8.
Changes
F8_E4M3FN,F8_E5M2) and type mapping{:f, 8, :e4m3fn}and{:f, 8, :e5m2}in:dtype_from_string/1- Parse "F8_E4M3" from safetensor headerstensor_byte_size/1- Calculate byte size for FP8 tensorstensor_to_iodata/1- Serialize FP8 tensorsbuild_tensor/2- Deserialize FP8 tensorsTest plan
Notes
This is the first PR in a series to enable native FP8 model inference: