-
Notifications
You must be signed in to change notification settings - Fork 75
Open
Description
Feature Description
As the title.
Motivation and Use Case
-
Input:
Models whose weights have already been quantized to FP8 with per‑tensor granularity, such as: https://huggingface.co/mistralai/Devstral-2-123B-Instruct-2512
Output:
Re‑quantized models in other dtypes, e.g.W4A16 -
The relevant implementation starting point is here:
auto-round/auto_round/utils/model.py
Lines 1070 to 1075 in 6190869
if layer.__class__.__name__ == "CompressedLinear": dq_weight = layer.compressor.decompress_module(layer) else: weight_scale = layer.weight_scale if hasattr(layer, "weight_scale") else layer.weight_scale_inv data_type = getattr(layer, "data_type", None) dq_weight = dequant_block_fp8_weight(layer.weight, weight_scale, layer.block_size, data_type=data_type)
Alternatives Considered
No response
Definition of Done
- Allow taking
mistralai/Devstral-2-123B-Instruct-2512as input, andW4A16as output. - UT
Additional Context
No response
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers