Skip to content

Commit d05312b

Browse files
Fridah-nvclaude
andcommitted
docs(changelog): scope layerwise entries across #1571/#1592
#1571 ships the two foundation knobs (get_qdq_activations_from_prev_layer, save_every); this PR adds the two optimizations on top: calib_mutates_weights (skip weight checkpoint + writeback for amax-only algorithms) and meta-device skip-layer placeholders. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>
1 parent 39312dd commit d05312b

1 file changed

Lines changed: 2 additions & 1 deletion

File tree

CHANGELOG.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,8 @@ Changelog
4040
- Add mixed-precision FP8 + NVFP4 export for Megatron-Core: per-layer ``quant_algo`` recorded under ``quantized_layers`` in ``hf_quant_config.json``, PP-aware ``kv_cache_dtype`` gather, fused-QKV exclude split into per-HF-name ``q/k/v_proj`` entries.
4141
- Add Nemotron-3-Super-120B-A12B PTQ recipes ``modelopt_recipes/models/Nemotron-3-Super-120B-A12B/super-nvfp4.yaml`` (MSE-mixed) and ``super-nvfp4-max-calib.yaml`` (max-calib mixed): NVFP4 W4A4 routed experts + FP8 per-tensor shared experts / Mamba in/out_proj + FP8 KV cache.
4242
- Add quantized ``nn.Embedding`` support. ``nn.Embedding`` is now registered in ``QuantModuleRegistry`` and exposes ``weight_quantizer`` (embedding table), ``output_quantizer`` (lookup activations), and a permanently disabled ``input_quantizer`` placeholder — embedding inputs are integer indices and cannot be fake-quantized, so direct ``enable*()`` calls raise. ``export_hf_checkpoint`` packs quantized embedding weights alongside Linear layers. Embedding quantizers are opt-in (``parent_class: nn.Embedding`` disabled by default).
43-
- Group layerwise calibration options under a nested ``LayerwiseConfig`` and add three knobs: ``get_qdq_activations_from_prev_layer`` (correct GPTQ-Hessian vs max-calib activation semantics — defaults to True for GPTQ, False for max/mse/local_hessian), ``save_every`` (gate per-window ``next_inputs.pt`` activation-cache writes), and ``calib_mutates_weights`` (set False to skip layer-weight checkpointing/writeback when calibration only updates quantizer state). Legacy bool ``layerwise`` and flat ``layerwise_checkpoint_dir`` keys still work; the bool form emits a ``DeprecationWarning``.
43+
- Group layerwise calibration options under a nested ``LayerwiseConfig`` and add two knobs: ``get_qdq_activations_from_prev_layer`` (correct GPTQ-Hessian vs max-calib activation semantics — defaults to True for GPTQ, False for max/mse/local_hessian) and ``save_every`` (gate per-window ``next_inputs.pt`` activation-cache writes). Legacy bool ``layerwise`` and flat ``layerwise_checkpoint_dir`` keys still work; the bool form emits a ``DeprecationWarning``.
44+
- Add two layerwise-calibration memory optimizations: ``calib_mutates_weights`` (set False for amax-only algorithms — max/mse/local_hessian — to skip the per-layer weight checkpoint blob and in-memory writeback, persisting only quantizer state), and meta-device skip-layer placeholders (already-calibrated layers emit zero-filled ``meta`` tensors instead of real-device buffers, eliminating their activation memory — models with real-device inter-layer ops on the hidden state are unsupported).
4445

4546
**Bug Fixes**
4647

0 commit comments

Comments
 (0)