Commit b577be3
feat(model): thread weight_dtype through HF export for plain-dtype output
Export has two consumers — online weight sync for RL rollout
(export_hf_weights) and on-disk checkpoints (save_hf_pretrained). Each
gains an optional weight_dtype that flows through WeightConversionTask
into the export stream.
Per review (HollowMan6): the plain-dtype cast is now generic, not
DSv4-only. build_conversion_tasks stamps weight_dtype onto each task
(no post-hoc dataclasses.replace except for caller-supplied tasks), and
the cast lives in the shared stream path covering both the standard and
grouped-export branches. The DSv4 hook simply skips requantization when
weight_dtype is set and returns the converted weights unchanged, letting
the generic path cast the dtype — keeping plain-dtype export identical
across bridges. Adds --export-weight-dtype to the multi-gpu convert
example.
Validated end-to-end on 32x GB300: bf16 export = 35020 tensors / 0
scales; quantized export = 69187 / 34167.
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Signed-off-by: Lingrui Mei <lmei@nvidia.com>1 parent fc05b1d commit b577be3
4 files changed
Lines changed: 53 additions & 19 deletions
File tree
- examples/conversion
- src/megatron/bridge/models
- conversion
- deepseek
- tests/unit_tests/models/deepseek
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
329 | 329 | | |
330 | 330 | | |
331 | 331 | | |
332 | | - | |
333 | | - | |
334 | | - | |
| 332 | + | |
| 333 | + | |
| 334 | + | |
| 335 | + | |
335 | 336 | | |
336 | 337 | | |
337 | 338 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
120 | 120 | | |
121 | 121 | | |
122 | 122 | | |
123 | | - | |
124 | | - | |
| 123 | + | |
| 124 | + | |
| 125 | + | |
| 126 | + | |
125 | 127 | | |
126 | 128 | | |
127 | 129 | | |
| |||
899 | 901 | | |
900 | 902 | | |
901 | 903 | | |
| 904 | + | |
| 905 | + | |
| 906 | + | |
| 907 | + | |
| 908 | + | |
| 909 | + | |
| 910 | + | |
| 911 | + | |
| 912 | + | |
| 913 | + | |
| 914 | + | |
| 915 | + | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
| 919 | + | |
| 920 | + | |
902 | 921 | | |
903 | 922 | | |
904 | 923 | | |
| |||
1256 | 1275 | | |
1257 | 1276 | | |
1258 | 1277 | | |
1259 | | - | |
| 1278 | + | |
| 1279 | + | |
1260 | 1280 | | |
1261 | 1281 | | |
1262 | 1282 | | |
| |||
1312 | 1332 | | |
1313 | 1333 | | |
1314 | 1334 | | |
| 1335 | + | |
1315 | 1336 | | |
1316 | 1337 | | |
1317 | 1338 | | |
| |||
1336 | 1357 | | |
1337 | 1358 | | |
1338 | 1359 | | |
| 1360 | + | |
| 1361 | + | |
1339 | 1362 | | |
1340 | 1363 | | |
1341 | 1364 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
913 | 913 | | |
914 | 914 | | |
915 | 915 | | |
916 | | - | |
917 | | - | |
| 916 | + | |
| 917 | + | |
| 918 | + | |
918 | 919 | | |
919 | 920 | | |
920 | | - | |
921 | | - | |
922 | | - | |
923 | | - | |
| 921 | + | |
924 | 922 | | |
925 | 923 | | |
926 | 924 | | |
| |||
Lines changed: 19 additions & 7 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
441 | 441 | | |
442 | 442 | | |
443 | 443 | | |
444 | | - | |
| 444 | + | |
445 | 445 | | |
446 | 446 | | |
447 | 447 | | |
| |||
451 | 451 | | |
452 | 452 | | |
453 | 453 | | |
| 454 | + | |
| 455 | + | |
| 456 | + | |
| 457 | + | |
| 458 | + | |
454 | 459 | | |
455 | | - | |
456 | | - | |
457 | | - | |
458 | | - | |
| 460 | + | |
459 | 461 | | |
460 | 462 | | |
461 | 463 | | |
462 | 464 | | |
463 | | - | |
| 465 | + | |
| 466 | + | |
| 467 | + | |
| 468 | + | |
| 469 | + | |
| 470 | + | |
| 471 | + | |
| 472 | + | |
| 473 | + | |
| 474 | + | |
464 | 475 | | |
465 | | - | |
| 476 | + | |
| 477 | + | |
466 | 478 | | |
467 | 479 | | |
468 | 480 | | |
| |||
0 commit comments