how to quant fp16 to fp8? #20198

dongyibo · 2026-03-09T14:37:00Z

dongyibo
Mar 9, 2026

I use llm-compressors but found quantization_config in config.json generated just like this:

"quantization_config": {
...
"format": "float-quantized",
"global_compression_ratio": null,
"ignore": [
"lm_head"
],
"quant_method": "compressed-tensors",
"quantization_status": "compressed",
},

I notice that quant_method is compressed-tensors,it's performance is poor than quant_method of fp8 by many tests in sglang.

at the sametime, I found quant_method in deepseekv3's config.json is fp8. So, how should we quantify it to obtain a quantized model with quant_method as fp8?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to quant fp16 to fp8? #20198

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

how to quant fp16 to fp8? #20198

Uh oh!

dongyibo Mar 9, 2026

Replies: 0 comments

dongyibo
Mar 9, 2026