feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output#4301
feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output#4301Meirtz wants to merge 5 commits into
Conversation
|
Reworked per reviewer feedback (offline discussion): the hook serves two export consumers — online weight streaming to rollout engines and on-disk checkpoints — so the bridge-level boolean is gone. Now |
|
Full-model E2E validation (DeepSeek-V4-Flash, 43 layers, real weights, TP1/PP4/EP8 on 8×GB300; same imported Megatron checkpoint for both runs):
35,020 + 34,167 = 69,187 — the bf16 artifact contains exactly every weight with no scale companions (I32 = Two notes from the run: (1) the smoke caught a real bug in the first version of this PR — |
DSv4 HF export unconditionally re-creates the source repo's quantized weight/scale layout (FP8 attention / MXFP4 experts), so bf16-SFT'd weights get post-hoc quantized and the artifact carries *.scale tensors the training never saw. Add DeepSeekV4Bridge.export_quantized (default True, behavior unchanged) and a --no-quantized-export flag on the export CLI so SFT products can be exported as plain bf16 with exact train/inference parity. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
…epSeek-V4 output Rework after review: the requantize hook runs on BOTH export consumers — online weight streaming to rollout engines (export_hf_weights) and on-disk checkpoints (save_hf_pretrained) — so a bridge-level boolean cannot configure them independently. Add weight_dtype: Optional[torch.dtype] to export_hf_weights / save_hf_pretrained / stream_weights_megatron_to_hf, carried per-task via a new optional WeightConversionTask.weight_dtype field (hook signatures unchanged; other bridges unaffected). The DeepSeek-V4 bridge emits plain weights in that dtype (no *.scale) when set, and keeps re-creating the source repo's quantized layout by default. CLI: --export-weight-dtype on the export subcommand. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
…s.replace Caught by a full-model export smoke (the unit tests used MagicMock tasks, which do not enforce frozen). Tests now use real WeightConversionTask instances. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
ce66e82 to
1b93e3c
Compare
|
/ok to test 1b93e3c |
The SimpleNamespace stand-in lacks the new weight_dtype field, so the export hook's task.weight_dtype access raises AttributeError in the pre-existing quantized-export tests. Constructing the real (frozen) dataclass keeps the helper in sync with future field additions. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
|
/ok to test 979e77d |
export_hf_weights/save_hf_pretrained/save_hf_weights now forward the new weight_dtype kwarg, so the exact-call mock assertions need it. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com> Signed-off-by: Lingrui Mei <lmei@nvidia.com>
|
/ok to test 5aeb8d1 |
What
Thread
weight_dtype: Optional[torch.dtype] = Nonethrough the HF export path —export_hf_weights/save_hf_pretrained/stream_weights_megatron_to_hf— carried per-task via a new optionalWeightConversionTask.weight_dtypefield. When set, the DeepSeek-V4 bridge emits plain weights in that dtype (no*.scalecompanions) instead of re-creating the source repo's quantized layout. Default (None) keeps today's behavior. CLI:--export-weight-dtypeon the export subcommand.Why
DSv4 HF export unconditionally re-creates the source repo's quantized weight/scale layout (
maybe_modify_converted_hf_weight→requantize_hf_weight_scale_pairs, from #3969). That's right for checkpoint conversion, but bf16-SFT'd weights get silently post-hoc quantized — a user found*.scaletensors in their SFT export and asked about train/inference parity.Design (revised after reviewer feedback): the requantize hook runs on both export consumers — online weight streaming to rollout engines (
export_hf_weights, e.g. verl RL weight sync) and on-disk checkpoints (save_hf_pretrained) — so a bridge-level boolean cannot configure them independently. A dtype-typed parameter on each public API lets callers choose per path (e.g. bf16 to rollout for RL parity, quantized to disk for serving-format artifacts, or vice versa). Hook signatures are unchanged (the dtype rides on the task), so the other bridges overriding this hook (dsv3, gemma4, kimi, mimo, flux) are unaffected; DSv3 can adopt the same field later.Verified
.scalekeys is safe); exportedconfig.jsonis built fresh (torch_dtype: bfloat16, no quantization fields); safetensors index regenerated from written tensors;Notes
🤖 Generated with Claude Code