Commit bde162a
feat(deepseek): add --cast_mxfp4_to_nvfp4 to deepseek_v4 quantize step (#1653)
### What does this PR do?
Type of change: new feature
Brings the GPT-OSS lossless MXFP4 → NVFP4 cast (#1372) to DeepSeek V4's
routed-expert export by adding a `--cast_mxfp4_to_nvfp4` flag to
`examples/deepseek/deepseek_v4/quantize_to_nvfp4.py`.
To avoid duplicating the closed-form math, the shared numerics —
`mxfp4_to_nvfp4_global_amax`, `mxfp4_to_nvfp4_per_block_amax`, and the
E2M1/E4M3/E8M0 constants — are **hoisted out of the GPT-OSS example cast
into the library** at
`modelopt/torch/quantization/utils/numeric_utils.py`. Both the GPT-OSS
cast (`examples/llm_ptq/cast_mxfp4_to_nvfp4.py`) and the new DeepSeek
path now import them from there.
DeepSeek V4's routed experts ship as MXFP4 (E2M1 nibbles + a
power-of-two E8M0 scale per 32-element block). By default the export
dequantizes them to BF16 and re-quantizes to NVFP4 using the calibrated
per-tensor weight amax, which re-derives per-block scales from the data
and is therefore lossy. With the flag, the cast pins `scale_2 =
2^(k_max-8)` and each per-block E4M3 scale to `2^(k_j-m)` straight from
the source E8M0 scales, so `per_block_scale * scale_2 = 2^k_j` and the
NVFP4 nibbles equal the source MXFP4 nibbles bit-for-bit (for every
block whose `k_j` lands in E4M3's representable window; rare
out-of-range blocks clamp). The one V4-specific addition is that w1/w3
share a single `scale_2` for the fused GEMM1, so `k_max` is taken over
both projections. The flag only affects routed-expert **weights** —
activation `input_scale` still comes from `--amax_path` calibration.
### Usage
```bash
python deepseek_v4/quantize_to_nvfp4.py \
--amax_path ${AMAX} \
--source_ckpt ${DS_V4} \
--output_ckpt ${HF_NVFP4_PATH} \
--cast_mxfp4_to_nvfp4
```
### Testing
- The hoisted numerics get unit tests in
`tests/unit/torch/quantization/test_numeric_utils.py` (10 cases:
per-tensor global_amax, per-block amax incl. out-of-range,
magnitude-table cache) — 10/10 pass. The example test
`tests/examples/llm_ptq/test_cast_mxfp4_to_nvfp4.py` keeps the
cast-specific cases (quantizer naming, `build_amax_map`,
`apply_to_model`).
- Validated on real DeepSeek-V4-Flash expert tensors (incl. the on-disk
`float8_e8m0fnu` scale dtype): 23.5M blocks, 100% lossless, 0 error.
- Generated a full NVFP4 checkpoint for DeepSeek-V4-Flash (43 layers,
256 routed experts) end-to-end: `[cast] lossless MXFP4->NVFP4 blocks:
8,657,043,456/8,657,043,456 (100.0000%)`. Output weights match an
independently-produced reference cast byte-for-byte (`weight_scale`,
`weight_scale_2`, packed nibbles modulo the harmless sign-of-zero).
### Before your PR is "*Ready for review*"
- Is this change backward compatible?: ✅ (new opt-in flag; default
export behavior unchanged; hoist re-exports through the existing example
module)
- If you copied code from any other sources or added a new PIP
dependency, did you follow guidance in `CONTRIBUTING.md`: ✅ N/A (no new
deps; shared numerics moved into the library rather than duplicated)
- Did you write any new necessary tests?: ✅ (library numerics covered by
`tests/unit/torch/quantization/test_numeric_utils.py`; end-to-end
validated on a real DeepSeek-V4 checkpoint)
- Did you update
[Changelog](https://github.com/NVIDIA/Model-Optimizer/blob/main/CHANGELOG.rst)?:
✅
- Did you get Claude approval on this PR?: ❌ (will run `/claude review`)
### Additional Information
Mirrors and reuses #1372 (GPT-OSS MXFP4 → NVFP4 cast); the closed-form
numerics are now shared via
`modelopt.torch.quantization.utils.numeric_utils`.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
<!-- This is an auto-generated comment: release notes by coderabbit.ai
-->
## Summary by CodeRabbit
* **New Features**
* Added `--cast_mxfp4_to_nvfp4` flag to perform a closed-form, mostly
lossless MXFP4→NVFP4 conversion for routed-expert weights with
aggregated lossless/block statistics.
* **Documentation**
* Updated DeepSeek V4 export instructions and README to document the new
flag and clarify calibration behavior for activation scales.
* **Chores**
* Exposed shared numeric quantization utilities for MXFP4→NVFP4 casting.
* **Tests**
* Added and updated tests to validate the new numeric helpers and
conversion behavior.
<!-- end of auto-generated comment: release notes by coderabbit.ai -->
Signed-off-by: Chenjie Luo <chenjiel@nvidia.com>
Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>1 parent 111b7eb commit bde162a
7 files changed
Lines changed: 532 additions & 294 deletions
File tree
- examples
- deepseek
- deepseek_v4
- llm_ptq
- modelopt/torch/quantization/utils
- tests
- examples/llm_ptq
- unit/torch/quantization
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
41 | 41 | | |
42 | 42 | | |
43 | 43 | | |
| 44 | + | |
44 | 45 | | |
45 | 46 | | |
46 | 47 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
174 | 174 | | |
175 | 175 | | |
176 | 176 | | |
| 177 | + | |
| 178 | + | |
| 179 | + | |
| 180 | + | |
| 181 | + | |
| 182 | + | |
| 183 | + | |
| 184 | + | |
| 185 | + | |
| 186 | + | |
| 187 | + | |
| 188 | + | |
| 189 | + | |
| 190 | + | |
| 191 | + | |
| 192 | + | |
| 193 | + | |
| 194 | + | |
| 195 | + | |
| 196 | + | |
| 197 | + | |
| 198 | + | |
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
63 | 63 | | |
64 | 64 | | |
65 | 65 | | |
| 66 | + | |
| 67 | + | |
| 68 | + | |
| 69 | + | |
| 70 | + | |
| 71 | + | |
| 72 | + | |
| 73 | + | |
| 74 | + | |
| 75 | + | |
| 76 | + | |
| 77 | + | |
| 78 | + | |
| 79 | + | |
66 | 80 | | |
67 | 81 | | |
68 | 82 | | |
| |||
91 | 105 | | |
92 | 106 | | |
93 | 107 | | |
| 108 | + | |
| 109 | + | |
| 110 | + | |
| 111 | + | |
| 112 | + | |
| 113 | + | |
| 114 | + | |
| 115 | + | |
| 116 | + | |
| 117 | + | |
| 118 | + | |
94 | 119 | | |
95 | 120 | | |
96 | 121 | | |
| |||
233 | 258 | | |
234 | 259 | | |
235 | 260 | | |
| 261 | + | |
| 262 | + | |
| 263 | + | |
| 264 | + | |
| 265 | + | |
| 266 | + | |
| 267 | + | |
| 268 | + | |
| 269 | + | |
| 270 | + | |
| 271 | + | |
| 272 | + | |
| 273 | + | |
| 274 | + | |
| 275 | + | |
| 276 | + | |
| 277 | + | |
| 278 | + | |
| 279 | + | |
| 280 | + | |
| 281 | + | |
| 282 | + | |
| 283 | + | |
| 284 | + | |
| 285 | + | |
| 286 | + | |
| 287 | + | |
| 288 | + | |
| 289 | + | |
| 290 | + | |
| 291 | + | |
| 292 | + | |
| 293 | + | |
| 294 | + | |
| 295 | + | |
| 296 | + | |
| 297 | + | |
| 298 | + | |
| 299 | + | |
| 300 | + | |
| 301 | + | |
| 302 | + | |
| 303 | + | |
| 304 | + | |
| 305 | + | |
| 306 | + | |
| 307 | + | |
| 308 | + | |
| 309 | + | |
| 310 | + | |
| 311 | + | |
| 312 | + | |
| 313 | + | |
| 314 | + | |
| 315 | + | |
| 316 | + | |
| 317 | + | |
| 318 | + | |
| 319 | + | |
| 320 | + | |
| 321 | + | |
| 322 | + | |
| 323 | + | |
| 324 | + | |
| 325 | + | |
| 326 | + | |
| 327 | + | |
| 328 | + | |
| 329 | + | |
| 330 | + | |
| 331 | + | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
| 336 | + | |
| 337 | + | |
| 338 | + | |
| 339 | + | |
| 340 | + | |
| 341 | + | |
| 342 | + | |
| 343 | + | |
| 344 | + | |
| 345 | + | |
| 346 | + | |
| 347 | + | |
| 348 | + | |
| 349 | + | |
| 350 | + | |
| 351 | + | |
| 352 | + | |
236 | 353 | | |
237 | 354 | | |
238 | 355 | | |
| |||
279 | 396 | | |
280 | 397 | | |
281 | 398 | | |
| 399 | + | |
282 | 400 | | |
283 | 401 | | |
284 | 402 | | |
| |||
289 | 407 | | |
290 | 408 | | |
291 | 409 | | |
292 | | - | |
293 | | - | |
294 | | - | |
| 410 | + | |
| 411 | + | |
| 412 | + | |
| 413 | + | |
| 414 | + | |
| 415 | + | |
| 416 | + | |
| 417 | + | |
| 418 | + | |
| 419 | + | |
295 | 420 | | |
296 | 421 | | |
297 | 422 | | |
| |||
335 | 460 | | |
336 | 461 | | |
337 | 462 | | |
338 | | - | |
339 | | - | |
340 | | - | |
| 463 | + | |
| 464 | + | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
| 475 | + | |
| 476 | + | |
| 477 | + | |
| 478 | + | |
341 | 479 | | |
342 | 480 | | |
343 | 481 | | |
| |||
607 | 745 | | |
608 | 746 | | |
609 | 747 | | |
| 748 | + | |
| 749 | + | |
| 750 | + | |
| 751 | + | |
| 752 | + | |
| 753 | + | |
| 754 | + | |
| 755 | + | |
| 756 | + | |
| 757 | + | |
| 758 | + | |
610 | 759 | | |
611 | 760 | | |
612 | 761 | | |
| |||
639 | 788 | | |
640 | 789 | | |
641 | 790 | | |
| 791 | + | |
642 | 792 | | |
643 | 793 | | |
644 | 794 | | |
| |||
647 | 797 | | |
648 | 798 | | |
649 | 799 | | |
| 800 | + | |
| 801 | + | |
| 802 | + | |
| 803 | + | |
| 804 | + | |
| 805 | + | |
650 | 806 | | |
651 | 807 | | |
652 | 808 | | |
| |||
0 commit comments