Skip to content

feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output#4301

Open
Meirtz wants to merge 5 commits into
NVIDIA-NeMo:mainfrom
Meirtz:fix/dsv4-bf16-export
Open

feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output#4301
Meirtz wants to merge 5 commits into
NVIDIA-NeMo:mainfrom
Meirtz:fix/dsv4-bf16-export

Conversation

@Meirtz

@Meirtz Meirtz commented Jun 11, 2026

Copy link
Copy Markdown
Member

What

Thread weight_dtype: Optional[torch.dtype] = None through the HF export path — export_hf_weights / save_hf_pretrained / stream_weights_megatron_to_hf — carried per-task via a new optional WeightConversionTask.weight_dtype field. When set, the DeepSeek-V4 bridge emits plain weights in that dtype (no *.scale companions) instead of re-creating the source repo's quantized layout. Default (None) keeps today's behavior. CLI: --export-weight-dtype on the export subcommand.

Why

DSv4 HF export unconditionally re-creates the source repo's quantized weight/scale layout (maybe_modify_converted_hf_weightrequantize_hf_weight_scale_pairs, from #3969). That's right for checkpoint conversion, but bf16-SFT'd weights get silently post-hoc quantized — a user found *.scale tensors in their SFT export and asked about train/inference parity.

Design (revised after reviewer feedback): the requantize hook runs on both export consumers — online weight streaming to rollout engines (export_hf_weights, e.g. verl RL weight sync) and on-disk checkpoints (save_hf_pretrained) — so a bridge-level boolean cannot configure them independently. A dtype-typed parameter on each public API lets callers choose per path (e.g. bf16 to rollout for RL parity, quantized to disk for serving-format artifacts, or vice versa). Hook signatures are unchanged (the dtype rides on the task), so the other bridges overriding this hook (dsv3, gemma4, kimi, mimo, flux) are unaffected; DSv3 can adopt the same field later.

Verified

  • the saver streams only yielded tensors (omitting .scale keys is safe); exported config.json is built fresh (torch_dtype: bfloat16, no quantization fields); safetensors index regenerated from written tensors;
  • unit tests cover dtype-set (plain weights, non-float tensors untouched) and default (requantize) paths.

Notes

  • AI-assisted (Claude); analysis, validation and review by the human author.

🤖 Generated with Claude Code

@Meirtz Meirtz added feature New capabilities, enhancements, or enablement work area:model Model implementations and HF bridge logic labels Jun 11, 2026
@yaoyu-33 yaoyu-33 added the needs-review PR is ready for code review and waiting on a reviewer label Jun 11, 2026
@Meirtz Meirtz changed the title feat(model): optional bf16 (non-quantized) HF export for DeepSeek-V4 feat(model): thread weight_dtype through HF export for plain-dtype DeepSeek-V4 output Jun 11, 2026
@Meirtz

Meirtz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Reworked per reviewer feedback (offline discussion): the hook serves two export consumers — online weight streaming to rollout engines and on-disk checkpoints — so the bridge-level boolean is gone. Now weight_dtype: Optional[torch.dtype] on export_hf_weights / save_hf_pretrained, carried per-task (WeightConversionTask.weight_dtype), hook signatures unchanged. d10a8e7e.

@Meirtz

Meirtz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

Full-model E2E validation (DeepSeek-V4-Flash, 43 layers, real weights, TP1/PP4/EP8 on 8×GB300; same imported Megatron checkpoint for both runs):

export tensors .scale tensors dtypes
default (quantized) 69,187 34,167 F8_E4M3, F8_E8M0, I8 (MXFP4), BF16, F32, I32
--export-weight-dtype bfloat16 35,020 0 BF16, I32

35,020 + 34,167 = 69,187 — the bf16 artifact contains exactly every weight with no scale companions (I32 = tid2eid routing buffers, correctly left untouched).

Two notes from the run: (1) the smoke caught a real bug in the first version of this PR — WeightConversionTask is a frozen dataclass, so the dtype is now applied via dataclasses.replace (ce66e82), and the unit tests now use real task instances instead of mocks; (2) with weight_dtype set, the saver's completeness check counts the source's .scale entries as "not written" (export proceeds with --not-strict) — cosmetic, can be polished in a follow-up by excluding scale keys from the expected set.

Meirtz and others added 3 commits June 12, 2026 02:22
DSv4 HF export unconditionally re-creates the source repo's quantized weight/scale layout (FP8 attention / MXFP4 experts), so bf16-SFT'd weights get post-hoc quantized and the artifact carries *.scale tensors the training never saw. Add DeepSeekV4Bridge.export_quantized (default True, behavior unchanged) and a --no-quantized-export flag on the export CLI so SFT products can be exported as plain bf16 with exact train/inference parity.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Lingrui Mei <lmei@nvidia.com>
…epSeek-V4 output

Rework after review: the requantize hook runs on BOTH export consumers — online weight streaming to rollout engines (export_hf_weights) and on-disk checkpoints (save_hf_pretrained) — so a bridge-level boolean cannot configure them independently. Add weight_dtype: Optional[torch.dtype] to export_hf_weights / save_hf_pretrained / stream_weights_megatron_to_hf, carried per-task via a new optional WeightConversionTask.weight_dtype field (hook signatures unchanged; other bridges unaffected). The DeepSeek-V4 bridge emits plain weights in that dtype (no *.scale) when set, and keeps re-creating the source repo's quantized layout by default. CLI: --export-weight-dtype on the export subcommand.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Lingrui Mei <lmei@nvidia.com>
…s.replace

Caught by a full-model export smoke (the unit tests used MagicMock tasks, which do not enforce frozen). Tests now use real WeightConversionTask instances.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Lingrui Mei <lmei@nvidia.com>
@Meirtz

Meirtz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/ok to test 1b93e3c

The SimpleNamespace stand-in lacks the new weight_dtype field, so the
export hook's task.weight_dtype access raises AttributeError in the
pre-existing quantized-export tests. Constructing the real (frozen)
dataclass keeps the helper in sync with future field additions.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Lingrui Mei <lmei@nvidia.com>
@Meirtz

Meirtz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/ok to test 979e77d

export_hf_weights/save_hf_pretrained/save_hf_weights now forward the new
weight_dtype kwarg, so the exact-call mock assertions need it.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Lingrui Mei <lmei@nvidia.com>
@Meirtz

Meirtz commented Jun 11, 2026

Copy link
Copy Markdown
Member Author

/ok to test 5aeb8d1

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:model Model implementations and HF bridge logic feature New capabilities, enhancements, or enablement work needs-review PR is ready for code review and waiting on a reviewer

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants