feat: Layerwise calibration: nested config + QDQ-from-prev-layer flag + checkpoint I/O knobs by Fridah-nv · Pull Request #1571 · NVIDIA/Model-Optimizer

Fridah-nv · 2026-05-29T23:02:28Z

What does this PR do?

Type of change: new feature

Groups all layerwise-calibration options under a nested LayerwiseConfig and adds three new behavior knobs to it. All changes are backward compatible.

1. Nested `layerwise` config

QuantizeAlgorithmConfig.layerwise changes from bool to a Pydantic submodel:

class LayerwiseConfig(ModeloptBaseConfig):
    enable: bool = False
    get_qdq_activations_from_prev_layer: bool = False
    checkpoint_dir: str | None = None
    save_every: int = 1
    save_quantizers_only: bool = False

Backward compatibility:

layerwise: True/False still accepted (emits DeprecationWarning).
Flat layerwise_checkpoint_dir silently migrated into layerwise.checkpoint_dir.
Legacy use_sequential alias preserved (and resolved during flat-key migration so it can't be dropped).
Conflicting flat+nested checkpoint_dir values raise.
All 7 shipped PTQ recipes (modelopt_recipes/general/ptq/*.yaml, huggingface/qwen3_5*/ptq/*.yaml) migrated to the
canonical nested shape — no semantic change.

2. `get_qdq_activations_from_prev_layer` — correct GPTQ vs max-calib semantics

Controls what layer N's calibration sees:

True (GPTQ default): activations carry the quantize-dequantize error of layers 0..N-1 — GPTQ's Hessian-compensation
goal.
False (max/mse/local_hessian default): full-precision activations, matching the non-layerwise pass exactly.

The False branch wraps the next-layer input capture forward with the existing set_quantizer_by_cfg_context deny-all idiom
({"quantizer_name": "*", "enable": False}).

GPTQ's per-algorithm default is enforced via @model_validator(mode="after") that reads LayerwiseConfig.model_fields_set —
works for every input shape (empty constructor, bool, partial dict, full dict) and lets explicit user values override.

3. `save_every` — gate the large activation-cache writes

save_every: int = 1 (ge=1). With N > 1, the per-layer next_inputs.pt (cached activation tensors, the largest checkpoint
artifact for most models) is only written for the boundary layer of each N-layer window. Per-layer
weight/quantizer/output_meta files are still written every layer (resume needs them to replay skip layers correctly).
Interrupting mid-window re-calibrates that window on resume.

UPDATE: drop this in current PR and moved to #1640

4. `save_quantizers_only` — algorithm-aware weight-blob skipping

save_quantizers_only: bool = False. When True, skip weights.pt entirely and persist just the per-quantizer state_dict
slice (carries _amax) to a new quantizer_buffers.pt. On resume, full_restore reloads only the quantizer slice and
trusts that algorithm semantics didn't mutate layer.weight.

Safety is enforced by a whitelist: _supports_save_quantizers_only: ClassVar[bool] = False on QuantizeAlgorithmConfig,
overridden to True only on MaxCalibConfig, MseCalibConfig, LocalHessianCalibConfig (audited — these only touch
_amax). Weight-mutating algorithms (GPTQ folds Hessian updates, AWQ/SmoothQuant fold pre-quant scales) reject the flag at
config-construction time so in-place weight updates can't be silently lost on resume.

Usage

import modelopt.torch.quantization as mtq

# GPTQ — `get_qdq_activations_from_prev_layer` defaults to True (Hessian semantics).
# save_every reduces activation-cache I/O.
mtq.quantize(
    model,
    {
        "quant_cfg": [...],
        "algorithm": {
            "method": "gptq",
            "layerwise": {
                "enable": True,
                "checkpoint_dir": "/path/to/ckpts",
                "save_every": 4,
            },
        },
    },
    forward_loop=forward_loop,
)

# Max-calibration — `get_qdq_activations_from_prev_layer` defaults to False (FP from prior layers).
# save_quantizers_only skips the weights blob since max only updates _amax.
mtq.quantize(
    model,
    {
        "quant_cfg": [...],
        "algorithm": {
            "method": "max",
            "layerwise": {
                "enable": True,
                "checkpoint_dir": "/path/to/ckpts",
                "save_quantizers_only": True,
            },
        },
    },
    forward_loop=forward_loop,
)

Testing

New / updated unit tests in tests/unit/torch/quantization/:

test_config_validation.py — TestLayerwiseNestedConfig covers nested-form acceptance, bool-form
DeprecationWarning, flat layerwise_checkpoint_dir migration, conflicting flat+nested checkpoint_dir, use_sequential
alias survival under migration, per-algorithm qdq defaults (parametrized Max/GPTQ), save_every ge=1 validation, and the
save_quantizers_only whitelist — parametrized rejection on [GPTQ, AWQLite, SmoothQuant] and acceptance on [Max, Mse, LocalHessian].
test_layerwise_calibrate.py —
- test_layerwise_no_qdq_matches_sequential_amax — behavioral equivalence: layerwise + qdq=False produces the same
  per-quantizer _amax as the non-layerwise (sequential) max-calibration flow (verified via torch.testing.assert_close).
- test_layerwise_save_every_writes_next_inputs_only_at_window_boundaries — window-save layout (all layer dirs present,
  next_inputs.pt only at boundaries).
- test_layerwise_save_quantizers_only_resume_matches_one_shot_amax — end-to-end resume: full run → manifest rewound →
  fresh model resumes → final _amax matches the one-shot baseline; also pins the on-disk shape (no weights.pt,
  quantizer_buffers.pt present).

End-to-end correctness verification

Ran 4 PTQ jobs on Qwen3-8B with NVFP4 W4A16 quant_cfg and --calib_size 16, one GPU
each on 4 GPUs

Run	Algorithm	Layerwise config	Purpose
A	GPTQ	`enable=true, get_qdq_activations_from_prev_layer=true, save_every=5`	New nested form + the new`save_every` knob
B	MSE	`enable=true, get_qdq_activations_from_prev_layer=false, save_quantizers_only=true`	New nested form + the new `save_quantizers_only` knob
C	GPTQ	Legacy flat form: `layerwise: true, layerwise_checkpoint_dir: ...`	Backward-compat baseline for A (exercises the migration validator)
D	MSE	`enable=false`	Non-layerwise baseline for B

Across two pairwise comparisons (A vs C ; B vs D), all 905 tensors in the exported HF checkpoints are bit-identical with hf_quant_config.json matching exactly — confirming the new layerwise knobs preserve correctness and the flat-form backward-compat path is intact.

Before your PR is "Ready for review"

Make sure you read and follow Contributor guidelines
and your commits are signed (git commit -s -S).

Make sure you read and follow the Security Best
Practices (e.g.
avoiding hardcoded trust_remote_code=True, torch.load(..., weights_only=False), pickle, etc.).

Is this change backward compatible?: ✅ — layerwise: True/False still accepted with a DeprecationWarning; flat
layerwise_checkpoint_dir silently migrated; use_sequential alias preserved. The two new knobs default to no-op behavior
(save_every=1, save_quantizers_only=False).
If you copied code from any other sources or added a new PIP dependency, did you follow guidance in CONTRIBUTING.md: N/A
— no new dependencies.
Did you write any new necessary tests?: ✅ — see Testing section.
Did you update Changelog?: ✅
ready for review.
Did you get Claude approval on this PR?: ✅ — /claude review consulted iteratively; review findings (GPTQ default
survival, save_quantizers_only whitelist scope, docstring accuracy, dead layer param) addressed in-PR.

Additional Information

Notes on design choices that came out of internal review:

GPTQ's qdq=True default uses a model_validator(mode="after") + model_fields_set check rather than a default_factory
— a default_factory is only fired when the field is absent, so any user-supplied dict (the natural way to enable
layerwise) would silently lose the GPTQ default.
save_quantizers_only is enforced as a whitelist (_supports_save_quantizers_only) rather than a per-algorithm blacklist,
which keeps future weight-mutating algorithms safe by default.
set_quantizer_by_cfg_context is reused for the qdq-disable scope instead of a bespoke helper, keeping model_calib.py
aligned with the existing "deny-all" idiom documented at conversion.py:240.
Pre-validation recipe helpers in examples/llm_ptq/example_utils.py (needs_checkpoint_path_update /
resolve_checkpoint_dir) accept both flat and nested shapes since they run before Pydantic validation;
resolve_checkpoint_dir now also returns the resolved path so the caller doesn't re-derive it.

Summary by CodeRabbit

New Features
- Layerwise calibration now uses a nested config (enable, get_qdq_activations_from_prev_layer, checkpoint_dir, save_every, save_quantizers_only); checkpointing supports quantizer-only saves and windowed commits with resume support.
Behavior
- GPTQ calibration defaults get_qdq_activations_from_prev_layer=True when unspecified; save_every controls when next-inputs/manifests are written and must be positive.
Deprecations
- Legacy flat-style layerwise keys are still accepted but emit DeprecationWarning and are auto-migrated (conflicts detected).
Examples/UX
- Tools auto-detect legacy vs nested checkpoint layouts and report the resolved path.
Tests
- Expanded coverage for nested configs, validation, checkpoint/resume semantics, and windowed saves.

coderabbitai · 2026-05-29T23:02:42Z

Ready to act? Review this PR in Change Stack to turn feedback into patch suggestions you can inspect and refine.

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR restructures layerwise quantization calibration config from flat boolean and separate checkpoint-dir into a nested LayerwiseConfig with options enable, get_qdq_activations_from_prev_layer, checkpoint_dir, save_every, and save_quantizers_only. It adds migration/validation, wires the options through calibration and checkpointing, updates examples/recipes, and expands tests.

Changes

Layerwise Configuration Restructuring

Layer / File(s)	Summary
LayerwiseConfig schema, validation, and legacy migration `modelopt/torch/quantization/config.py`, `tests/unit/torch/quantization/test_config_validation.py`	Introduces `LayerwiseConfig` and normalization/migration of legacy `layerwise` boolean/alias and `layerwise_checkpoint_dir` into the nested shape (emits DeprecationWarning). Enforces validation (checkpoint_dir requires enable=True) and adds GPTQ defaulting for `get_qdq_activations_from_prev_layer`. Changelog updated.
Calibration pipeline consumption `modelopt/torch/quantization/mode.py`, `modelopt/torch/quantization/model_calib.py`	Entry points parse nested `layerwise` options (enable, checkpoint_dir, get_qdq_activations_from_prev_layer, save_every), forward them into `layerwise_calibrate`, adjust inter-layer capture timing (`qdq_from_prev`), and initialize `_CheckpointState` with `save_every`. Error/doc wording updated to `layerwise.enable`.
Checkpoint persistence with windowing `modelopt/torch/quantization/utils/layerwise_calib.py`, `tests/unit/torch/quantization/test_layerwise_calibrate.py`	Manifest now stores `save_every`. Adds `_save_layer_files` for per-layer artifacts; `_CheckpointState` accepts `save_every`, validates resumes against manifest `num_layers` and `save_every`, and only commits `next_inputs.pt`/manifest at window boundaries or final layer. Restore applies quantizer state before weights.
User-facing integration: examples, HF script, and recipes `examples/llm_ptq/example_utils.py`, `examples/llm_ptq/hf_ptq.py`, `modelopt_recipes/general/ptq/.yaml`, `modelopt_recipes/huggingface//*.yaml`	`_layerwise_checkpoint_dir_location()` detects flat vs nested checkpoint-dir shapes; `resolve_checkpoint_dir()` returns `(quant_cfg, resolved_path)` and writes the resolved path back in the original recipe shape. HF PTQ script checks `layerwise.enable` and prints the resolved dir. Multiple recipes updated to nested `layerwise: { enable: ... }`.
Tests: validation and layerwise calibration behavior `tests/unit/torch/quantization/test_config_validation.py`, `tests/unit/torch/quantization/test_layerwise_calibrate.py`	Adds focused tests for `use_sequential` deprecation alias, nested `layerwise` acceptance and migration, conflicting flat vs nested checkpoint_dir inputs, GPTQ defaults/overrides, qdq-from-prev behavior, `save_every` windowing and resume/crash semantics, and save_every mismatch detection.
Changelog documentation `CHANGELOG.rst`	Documents introduction of nested `LayerwiseConfig` with new checkpoint and QDQ activation options, and deprecation of legacy boolean and flat checkpoint_dir fields.

Sequence Diagram(s)

sequenceDiagram
  participant User
  participant Config as QuantizeAlgorithmConfig
  participant Pipeline as CalibPipeline
  participant Checkpoint as _CheckpointState

  User->>Config: provide legacy bool or nested layerwise config
  Config->>Config: coerce legacy boolean/alias → nested LayerwiseConfig (emit DeprecationWarning)
  Config->>Config: migrate layerwise_checkpoint_dir → layerwise.checkpoint_dir (if present)
  Config->>Pipeline: return normalized LayerwiseConfig

  Pipeline->>Pipeline: extract enable, checkpoint_dir, save_every, get_qdq_activations_from_prev_layer
  alt layerwise.enable == True
    Pipeline->>Checkpoint: _CheckpointState.from_folder(checkpoint_dir, num_layers, save_every)
    loop per layer
      Pipeline->>Pipeline: calibrate layer
      alt get_qdq_activations_from_prev_layer == False
        Pipeline->>Pipeline: capture next_inputs before calib_func (disable quantizers on current layer)
      else get_qdq_activations_from_prev_layer == True
        Pipeline->>Pipeline: capture next_inputs after calib_func (allow QDQ effects)
      end
      alt layer_idx % save_every == 0 OR final layer
        Pipeline->>Checkpoint: save(layer_idx, model, layers, next_inputs?)
        Checkpoint->>Checkpoint: write qstate, (weights.pt OR quantizer_buffers.pt), output_meta
        Checkpoint->>Checkpoint: commit manifest.json and next_inputs.pt
      else
        Pipeline->>Checkpoint: write per-layer files without committing manifest
      end
    end
  else
    Pipeline->>Pipeline: run sequential calibration
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

NVIDIA/Model-Optimizer#1251: Prior layerwise + checkpoint resume work that this change builds upon.

Suggested reviewers

cjluo-nv
Edwardf0t1
kinjalpatel27
meenchen

Caution

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

Ignore

❌ Failed checks (1 error, 1 warning)

Check name	Status	Explanation	Resolution
Security Anti-Patterns	❌ Error	Line 671-672 of layerwise_calib.py has torch.load(..., weights_only=False) without an inline comment justifying why it is safe, violating SECURITY.md.	Add inline comment before weights torch.load(): `# weights_only=False is safe: file is internally generated by _save_layer`
Docstring Coverage	⚠️ Warning	Docstring coverage is 54.84% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly and specifically describes the main changes: introducing nested LayerwiseConfig, adding the QDQ-from-prev-layer flag, and new checkpoint I/O controls.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fridah/layerwise-config

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Fridah-nv · 2026-05-29T23:07:25Z

/claude review

github-actions · 2026-05-29T23:08:58Z

PR Preview Action v1.8.1
🚀 View preview at https://NVIDIA.github.io/Model-Optimizer/pr-preview/pr-1571/
Built to branch `gh-pages` at 2026-06-05 23:40 UTC. Preview will be ready when the GitHub Pages deployment is complete.

cjluo-nv

Bot review — DM the bot to share feedback.

The nested LayerwiseConfig refactor, get_qdq_activations_from_prev_layer flag, and save_quantizers_only whitelist look well-designed and the back-compat shims (bool coercion + flat→nested migration with the use_sequential rescue) are thorough. Tests around config validation and the qdq=False-matches-sequential-amax behavioral check are good.

However, save_every > 1 looks broken: by the time _CheckpointState.save() is invoked for the boundary layer, the earlier layers in the window have already been replaced by _SkipLayer instances in transformer_layers (via _set_layer_states(layer_idx+1) swapping layer_idx-1 to a dummy during cache_outputs_for_next_layer_calib). Reading layers[i].state_dict() / quantizer_state(layers[i]) / layers[i]._layerwise_calib.output_meta on a _SkipLayer returns empty/None state because _original is attached via object.__setattr__ and is not a registered submodule. The new test_layerwise_save_every_writes_next_inputs_only_at_window_boundaries only checks file existence, so the silent corruption isn't caught — there is no resume test for save_every > 1. See inline comment for details and the suggested fix (snapshot to memory immediately after each layer is calibrated, then flush at boundaries; or save real layers eagerly and only window-gate next_inputs.pt).

Minor: changelog says "TODO before marking ready for review" in the PR body but the changelog entry is already added — please confirm the description is up to date before merge.

coderabbitai

Warning

CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.

Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.

👉 Steps to fix this

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@modelopt/torch/quantization/config.py`:
- Around line 153-160: Add an explicit __all__ declaration near the top of the
module that lists the new public API symbols (e.g. "LayerwiseConfig" and any
other public config classes, functions or constants defined in this file and the
other block referenced around 636-691); place it after the imports and ensure it
enumerates every symbol intended for export so star-imports are safe and stable,
and update corresponding package __init__.py re-exports if needed to re-export
these names.
- Around line 706-708: _coerce_layerwise_input currently returns
value.model_dump() for LayerwiseConfig which expands defaults and loses the
original model_fields_set causing GPTQ's _gptq_qdq_default to miss injecting
defaults; change _coerce_layerwise_input to return the LayerwiseConfig instance
(value) unchanged when isinstance(value, LayerwiseConfig) so downstream parsing
preserves model_fields_set (e.g., self.layerwise.model_fields_set) and
GPTQCalibConfig._gptq_qdq_default can correctly detect which fields the user
actually set.

In `@modelopt/torch/quantization/utils/layerwise_calib.py`:
- Around line 729-761: The loop that saves windowed checkpoints is capturing
_SkipLayer placeholders rather than the underlying calibrated layer because
cache_outputs_for_next_layer_calib() can replace entries in layers with
_SkipLayer before save; update the save loop in the function that iterates
layers[i] to detect instances of _SkipLayer and use layer = layer._original (or
otherwise snapshot the original layer list before cache outputs mutate it) so
_save_layer receives the real calibrated layer, its quantizer state
(quantizer_state(layer)), and correct output_meta rather than the placeholder.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e32cbbac-64bc-4625-80b8-93a1790c3d62

📥 Commits

Reviewing files that changed from the base of the PR and between 40a4dd3 and dbca1a5.

📒 Files selected for processing (16)

CHANGELOG.rst
examples/llm_ptq/example_utils.py
examples/llm_ptq/hf_ptq.py
modelopt/torch/quantization/config.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/utils/layerwise_calib.py
modelopt_recipes/general/ptq/nvfp4_default-kv_none-gptq.yaml
modelopt_recipes/general/ptq/nvfp4_experts_only-kv_fp8.yaml
modelopt_recipes/general/ptq/nvfp4_experts_only-kv_fp8_layerwise.yaml
modelopt_recipes/general/ptq/nvfp4_experts_only_mse-kv_fp8_cast.yaml
modelopt_recipes/general/ptq/nvfp4_mlp_only_mse-kv_fp8_cast.yaml
modelopt_recipes/huggingface/qwen3_5/ptq/w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml
modelopt_recipes/huggingface/qwen3_5_moe/ptq/w4a16_nvfp4-fp8_attn-kv_fp8_cast.yaml
tests/unit/torch/quantization/test_config_validation.py
tests/unit/torch/quantization/test_layerwise_calibrate.py

claude

Claude review — 1 CRITICAL, 0 IMPORTANT, 2 SUGGESTIONs.

Most impactful finding (inline on modelopt/torch/quantization/utils/layerwise_calib.py):

_save_layer calls _write_manifest(...) at the end of every individual call, but during a save_every > 1 window save the non-boundary layers in the window are written without a next_inputs.pt. So if the process is killed (or torch.save raises) between the first and last _save_layer call within a window, the manifest advances to a layer that has no next_inputs.pt. On resume, _CheckpointState.setup_resume then raises FileNotFoundError("Cannot resume: next_inputs.pt missing for layer N") and the checkpoint is permanently broken — there's no fallback that re-calibrates the partial window. The PR description's claim that "interrupting mid-window re-calibrates that window on resume" is the intended invariant, but the implementation doesn't enforce it.

The fix is small: only advance the manifest after the entire window has been flushed (drop the _write_manifest call from _save_layer and call it once at the end of _CheckpointState.save() with last_completed_layer = layer_idx). A regression test that monkeypatches _write_manifest to raise mid-window would lock this in. With save_every=1 (the default) the bug isn't reachable, so the existing tests don't catch it — this only bites users who opt into the new knob.

Other:

SUGGESTION: legacy flat layerwise_checkpoint_dir is migrated silently in _migrate_layerwise_checkpoint_dir. The bool-form layerwise emits a DeprecationWarning; consider symmetric treatment so the migration window for the flat key is signaled.
SUGGESTION: resolve_checkpoint_dir in examples/llm_ptq/example_utils.py only updates whichever shape it picks — if the user happens to specify both flat and nested with the same value, the rewrite produces a Pydantic conflict whose error message points at the resolver's output, not at the user's redundancy. Edge case but worth a guard.

Risk: Low–medium. The default path (save_every=1, no save_quantizers_only) is well-tested and the algorithm-correctness pieces (qdq-from-prev semantics, GPTQ default via model_fields_set, save_quantizers_only whitelist) look right and are covered by the new tests. The save_every>1 path has the resume-corruption bug above, which only opt-in users will hit.

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review: the previously flagged critical save_every > 1 corruption bug is still present in modelopt/torch/quantization/utils/layerwise_calib.py. The new test only checks file existence, not file contents, so the corruption remains silent. Other previous comments (manifest update during partial-window crash, _coerce_layerwise_input model_dump() losing model_fields_set) are also unaddressed. Marking comment rather than nudge because the critical bug from the previous review is unresolved.

Status of previous comments:

(critical, UNRESOLVED) save_every > 1 reads from _SkipLayer stand-ins → silent on-disk corruption. Still no resume test for save_every > 1.
(critical, UNRESOLVED) Manifest is advanced per-_save_layer call, so a mid-window crash leaves manifest=K with no next_inputs.pt for layer K, breaking resume permanently.
(major, UNRESOLVED) _coerce_layerwise_input calls .model_dump() on a LayerwiseConfig instance, which expands defaults and breaks _gptq_qdq_default's reliance on model_fields_set.
(minor) Cross-version save_every / save_quantizers_only mismatch on resume — still no manifest-side guard.
(minor) __all__ declaration in config.py — still missing.
(minor) Symmetric DeprecationWarning for flat layerwise_checkpoint_dir migration — still silent.
(minor) resolve_checkpoint_dir flat+nested redundancy — still no guard.

Resolved/no-longer-flagging:

Changelog entry is present and the "TODO before marking ready" note appears to have been dropped from the PR body.

Design-review pass: the PR refactors an existing in-repo subsystem (layerwise calibration + checkpoint format) rather than introducing a new one, so the design protocol's "is there an existing tool that already does this?" question doesn't apply — the existing tool is what's being refactored. The nested-config + whitelist + per-algorithm default approach is reasonable; correctness, not design, is the issue here.

Additional comments (outside the PR diff):

modelopt/torch/quantization/utils/layerwise_calib.py:522 — > Bot comment.

Critical (carried over, still unresolved): _save_layer calls _write_manifest(checkpoint_dir, idx, num_layers) at the end of every individual call, but during a save_every > 1 window save the non-boundary layers in the window are written without a next_inputs.pt. If the process is killed (or torch.save raises) between the first _save_layer and the boundary _save_layer within a window, the manifest advances to a layer that has no next_inputs.pt. On resume, setup_resume then raises FileNotFoundError("Cannot resume: next_inputs.pt missing for layer N") and the checkpoint is permanently broken — the PR description's claim that "interrupting mid-window re-calibrates that window on resume" is not what the code does.

Fix: drop the _write_manifest call from _save_layer and call it once at the end of _CheckpointState.save() (with last_completed_layer = layer_idx) so the manifest only advances after the boundary next_inputs.pt is on disk. A regression test that monkeypatches _write_manifest to raise mid-window would lock this in.

codecov · 2026-05-29T23:22:26Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 76.92%. Comparing base (54ce4e0) to head (ab19be0).
⚠️ Report is 2 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1571      +/-   ##
==========================================
+ Coverage   76.27%   76.92%   +0.65%     
==========================================
  Files         489      489              
  Lines       54415    54475      +60     
==========================================
+ Hits        41504    41904     +400     
+ Misses      12911    12571     -340

Flag	Coverage Δ
examples	`42.79% <42.52%> (+0.78%)`	⬆️
gpu	`58.45% <96.55%> (-1.42%)`	⬇️
regression	`14.90% <35.63%> (-0.22%)`	⬇️
unit	`54.05% <98.85%> (+0.04%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Fridah-nv · 2026-05-30T00:11:50Z

/claude review

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review: all previously-flagged critical issues are resolved with regression tests.

save_every > 1 _SkipLayer corruption (critical, was unresolved) — Fixed. _CheckpointState.save() now reads from layers[layer_idx] (the just-calibrated layer) and is called every iteration before the next-layer capture forward swaps prior layers to _SkipLayer. Per-layer files (weights/quantizer state/output_meta) are written eagerly; only next_inputs.pt + manifest are deferred to window boundaries.
Mid-window crash advancing manifest (critical, was unresolved) — Fixed. _save_layer_files no longer writes the manifest; _CheckpointState.save() writes it once at boundary/final, so a torch.save crash mid-window leaves the manifest pointing at the previous boundary. Covered by test_layerwise_save_every_mid_window_crash_recovers_at_prev_boundary (monkeypatches torch.save to crash on layer 2 mid-window, asserts manifest stays at layer 1).
_coerce_layerwise_input .model_dump() expanding defaults (critical/major, was unresolved) — Fixed. Now uses model_dump(exclude_unset=True) so LayerwiseConfig instance inputs preserve model_fields_set, and GPTQ's _gptq_qdq_default correctly injects True for fields the user didn't explicitly set. Test parametrization extended to cover LayerwiseConfig(enable=True) + LayerwiseConfig(enable=True, get_qdq_activations_from_prev_layer=False).
Save_every>1 resume parity test (was missing) — Added. test_layerwise_checkpoint_resume_matches_one_shot_amax parametrized over (save_every=2, save_quantizers_only=False) and (save_every=1, save_quantizers_only=True) rewinds the manifest and asserts amax equivalence with a one-shot baseline — i.e. it would fail under the previous corruption.
Cross-version save_every/save_quantizers_only mismatch (minor) — Addressed. Manifest now persists both keys; from_folder raises on any mismatch. Covered by test_layerwise_checkpoint_mismatch_save_every_raises.
Symmetric DeprecationWarning for flat layerwise_checkpoint_dir (minor) — Addressed; warning emitted in _migrate_layerwise_checkpoint_dir.
resolve_checkpoint_dir flat+nested redundancy (minor) — Addressed; resolver writes back to both shapes when both present, keeping the dict internally consistent.
__all__ declaration (minor) — Author marked out of scope; acceptable.

Design-review pass: this PR refactors an existing in-repo subsystem (layerwise calibration + checkpoint format) rather than introducing a new one, so the "is there an existing tool that already does this?" question doesn't apply — the existing tool is what's being refactored. The nested-config + whitelist + per-algorithm default approach is reasonable.

Complex PR: spans 8 directories (≥ 5). Looping in a human for approval.

claude

Claude review passed — no blocking issues found. LGTM

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review: all four previously-flagged critical issues are resolved with meaningful regression tests.

💬 save_every > 1 _SkipLayer corruption (critical, was unresolved) — Fixed. _CheckpointState.save() now reads from layers[layer_idx] (the just-calibrated layer) before the next iteration's capture forward swaps prior layers to _SkipLayer. Per-layer files (weights/quantizer state/output_meta) are written eagerly; only next_inputs.pt + manifest are deferred to window boundaries. Covered by the parametrized test_layerwise_checkpoint_resume_matches_one_shot_amax(save_every=2) parity test against a one-shot baseline.
💬 Mid-window crash advancing manifest (critical, was unresolved) — Fixed. _save_layer_files no longer writes the manifest; _CheckpointState.save() writes it once at boundary/final, so a torch.save crash mid-window leaves the manifest pointing at the previous boundary. Covered by test_layerwise_save_every_mid_window_crash_recovers_at_prev_boundary (monkeypatches torch.save to crash on layer 2 mid-window, asserts manifest stays at layer 1).
💬 _coerce_layerwise_input .model_dump() expanding defaults (critical, was unresolved) — Fixed. Now uses model_dump(exclude_unset=True) so LayerwiseConfig instance inputs preserve model_fields_set, and GPTQ's _gptq_qdq_default correctly injects True for fields the user didn't explicitly set. Test parametrization extended to cover LayerwiseConfig(enable=True) and LayerwiseConfig(enable=True, get_qdq_activations_from_prev_layer=False).
💬 qdq_from_prev=False weight-mutation leak (was raised by realAsma) — Fixed. Next-layer input capture moved to before calib_func when qdq_from_prev=False, so weight-mutating algorithms (GPTQ/AWQ/SmoothQuant folds) can't leak into downstream captures. Covered by test_layerwise_no_qdq_captures_inputs_before_calib_func_mutates_weights (destructive vs identity calib_func produce identical captured inputs).
💬 Cross-version save_every / save_quantizers_only mismatch (minor) — Addressed. Manifest now persists both keys; from_folder raises on any mismatch. Covered by test_layerwise_checkpoint_mismatch_save_every_raises.
💬 Symmetric DeprecationWarning for flat layerwise_checkpoint_dir (minor) — Addressed; warning emitted in _migrate_layerwise_checkpoint_dir.
💬 resolve_checkpoint_dir flat+nested redundancy (minor) — Addressed; resolver writes the resolved path back to both shapes when both are present so the migration validator sees consistent values.
💬 __all__ declaration in config.py (minor) — Author marked out of scope; acceptable.

Design-review pass: this PR refactors an existing in-repo subsystem (layerwise calibration + checkpoint format) rather than introducing a new one, so the "is there an existing tool that already does this?" question doesn't apply — the existing tool is what's being refactored. The nested-config + whitelist + per-algorithm default approach is reasonable.

Looping in a human for final sign-off because (a) complexity gate fired (16 files, +853/-135, 8 directories), and (b) author's reported end-to-end verification (Qwen3-8B, 4 PTQ runs, 905 tensors bit-identical across legacy-vs-nested and nested-vs-baseline pairs) is a strong correctness signal but lives outside CI — a human reviewer with the runtime context should confirm the bit-equivalence claim is sufficient.

realAsma · 2026-06-01T21:47:24Z

@Fridah-nv can you take a look at #1592?

I have a improvement for layerwise (Avoid write back with FSDP2 / CPU/Disk offload unless needed)

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/utils/layerwise_calib.py (1)

630-645: ⚠️ Potential issue | 🔴 Critical

Fix insecure resume deserialization in layerwise calibration (torch.load weights_only=False)

modelopt/torch/quantization/utils/layerwise_calib.py loads output_meta.pt, next_inputs.pt, quantizer_state.pt, and weights.pt via torch.load(..., weights_only=False) in _CheckpointState.setup_resume() and _CheckpointState.full_restore(). The inline “internally generated” justification isn’t a valid trust boundary because checkpoint_dir is caller-configurable and resume may read pre-seeded/tampered files, making this an unsafe pickle-backed code-execution path.

Change resume loading to avoid weights_only=False where feasible (e.g., use weights_only=True for weights.pt / quantizer_state.pt if compatible), and for output_meta.pt / next_inputs.pt switch to a pickle-free serialization or add an explicit provenance/trust gate tied to files written by this run.

Also remove or update the unused quantizer_buffers branch/docstring in _save_layer_files (it’s never passed from save() and full_restore() doesn’t load quantizer_buffers.pt).

🧹 Nitpick comments (1)

modelopt/torch/quantization/utils/layerwise_calib.py (1)

501-524: ⚡ Quick win

Drop the dead quantizer_buffers checkpoint path.

save() now always writes weights.pt, while full_restore() only reads weights.pt, so keeping quantizer_buffers in _save_layer_files() and in its docstring advertises a restore mode this file no longer implements. Removing the unused parameter/branch would keep the checkpoint format single-sourced and avoid future callers writing unreadable checkpoints.

♻️ Minimal cleanup

 def _save_layer_files(
     checkpoint_dir: str,
     idx: int,
-    weights: dict | None,
+    weights: dict,
     qstate: dict,
-    quantizer_buffers: dict | None,
     output_meta: tuple,
 ) -> None:
     """Write the per-layer files for layer *idx*.
 
-    Exactly one of ``weights`` (full layer state_dict) or ``quantizer_buffers``
-    (just the TensorQuantizer state_dict slice) is written; ``full_restore``
-    falls back to whichever is present. ``next_inputs.pt`` and ``manifest.json``
-    are deferred to window boundaries in :meth:`_CheckpointState.save`.
+    ``weights.pt``, ``quantizer_state.pt``, and ``output_meta.pt`` are written
+    every call. ``next_inputs.pt`` and ``manifest.json`` are deferred to window
+    boundaries in :meth:`_CheckpointState.save`.
     """
     d = _layer_dir(checkpoint_dir, idx)
     if os.path.isdir(d):
         shutil.rmtree(d)
     os.makedirs(d)
-    if weights is not None:
-        torch.save(weights, os.path.join(d, "weights.pt"))
-    elif quantizer_buffers is not None:
-        torch.save(quantizer_buffers, os.path.join(d, "quantizer_buffers.pt"))
+    torch.save(weights, os.path.join(d, "weights.pt"))
     torch.save(qstate, os.path.join(d, "quantizer_state.pt"))
     torch.save(output_meta, os.path.join(d, "output_meta.pt"))

As per coding guidelines, "Remove dead code including unused imports, unreachable branches, and obsolete helpers."

Also applies to: 713-719

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/utils/layerwise_calib.py` around lines 501 - 524,
The _save_layer_files function still accepts and documents the obsolete
quantizer_buffers path even though save()/full_restore() only use weights.pt;
remove the dead parameter and branch: delete the quantizer_buffers parameter
from _save_layer_files signature and all callers, remove the elif branch that
writes "quantizer_buffers.pt", and update the docstring to state only weights.pt
and quantizer_state.pt are written; also remove any duplicated/unused
quantizer_buffers handling in the later block around the other occurrence (lines
~713-719) so the checkpoint format remains single-sourced.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@modelopt/torch/quantization/utils/layerwise_calib.py`:
- Around line 501-524: The _save_layer_files function still accepts and
documents the obsolete quantizer_buffers path even though save()/full_restore()
only use weights.pt; remove the dead parameter and branch: delete the
quantizer_buffers parameter from _save_layer_files signature and all callers,
remove the elif branch that writes "quantizer_buffers.pt", and update the
docstring to state only weights.pt and quantizer_state.pt are written; also
remove any duplicated/unused quantizer_buffers handling in the later block
around the other occurrence (lines ~713-719) so the checkpoint format remains
single-sourced.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: b626c7ad-d66e-4d09-9923-a284fb419068

📥 Commits

Reviewing files that changed from the base of the PR and between af1bb4f and 51bc967.

📒 Files selected for processing (7)

CHANGELOG.rst
modelopt/torch/quantization/config.py
modelopt/torch/quantization/mode.py
modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/utils/layerwise_calib.py
tests/unit/torch/quantization/test_config_validation.py
tests/unit/torch/quantization/test_layerwise_calibrate.py

💤 Files with no reviewable changes (3)

modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/mode.py
tests/unit/torch/quantization/test_config_validation.py

✅ Files skipped from review due to trivial changes (1)

CHANGELOG.rst

cjluo-nv

Bot review — DM the bot to share feedback.

Re-review after extensive iteration. The four previously-flagged criticals all appear addressed with regression tests:

💬 save_every > 1 _SkipLayer corruption — Fixed. _CheckpointState.save() now reads from layers[layer_idx] per-call, before the next iteration's capture forward swaps prior layers to _SkipLayer. Per-layer files written eagerly; only next_inputs.pt + manifest deferred to boundaries.
💬 Mid-window crash advancing manifest — Fixed. _save_layer_files no longer writes the manifest; save() writes it once at boundary/final. Covered by test_layerwise_save_every_mid_window_crash_recovers_at_prev_boundary (monkeypatches torch.save to crash on layer 2 mid-window, asserts manifest stays at layer 1).
💬 _coerce_layerwise_input .model_dump() expanding defaults — Fixed. Now uses model_dump(exclude_unset=True). Tests cover LayerwiseConfig(enable=True) preserving model_fields_set so GPTQ's _gptq_qdq_default injects correctly.
💬 realAsma's qdq_from_prev=False weight-mutation leak — Fixed. Next-layer input capture moved to before calib_func when qdq_from_prev=False. Covered by test_layerwise_no_qdq_captures_inputs_before_calib_func_mutates_weights (destructive vs identity calib_func produce identical captured inputs).
💬 Cross-version save_every mismatch, symmetric DeprecationWarning for flat key, resolve_checkpoint_dir redundancy guard — all addressed.
__all__ declaration in config.py — author marked out of scope; acceptable.

Looping in a human for final sign-off because:

Complexity gate fired (16 files, +746/-130, 8 directories).
The author's strongest correctness signal — bit-identical 905-tensor exports across (legacy-flat, nested) and (layerwise-MSE, baseline-MSE) Qwen3-8B runs — lives outside CI; a human reviewer with the runtime context should confirm that's sufficient before merge.
Prior review iteration discovered 3 critical bugs that the unit tests didn't catch on the first pass; given the new opt-in surface (save_every, qdq_from_prev), one more set of human eyes is warranted.

Minor non-blocking observation: _save_layer_files still carries a dead quantizer_buffers parameter / branch (the save_quantizers_only feature was split out to PR #1640 per the PR body, but the helper signature kept the slot). Worth a follow-up cleanup but not a merge blocker.

realAsma · 2026-06-05T23:05:51Z

+        for batch in calib_data:
+            m(batch)
+
+    mtq.quantize(model_seq, _int8_layerwise_config({"method": "max"}), forward_loop=fwd)


Is this correct?
why are we passing layserwise config here?

🤖 Bot comment.

This is correct, but the helper name is confusing. _int8_layerwise_config just starts from the shipped INT8 config so the test uses the same real quant_cfg, then replaces cfg["algorithm"] with the dict passed in.

In this first call we pass only {"method": "max"}, so layerwise is not enabled and it exercises the normal sequential max-calibration baseline. The second call adds layerwise.enable=True plus get_qdq_activations_from_prev_layer=False, so the test compares that layerwise path against the sequential baseline under the same quantizer config.

We should rename the helper or update its docstring to make that clearer.

The _int8_layerwise_config name is mis-leading, it's just "an INT8 config with this algorithm block." updating the name and description.

realAsma · 2026-06-05T23:06:29Z

+        )
+
+
+def test_layerwise_no_qdq_matches_sequential_amax(monkeypatch):


@Fridah-nv can you double check this test?

realAsma

LGTM! @sugunav14 could you take a look?

please address https://github.com/NVIDIA/Model-Optimizer/pull/1571/changes#r3365862663

…layer Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

The skip-layer-weight-checkpoint optimization (save_quantizers_only) is moved out of this foundation PR; it lands complete in the stacked PR #1592 (as calib_mutates_weights). _CheckpointState now always saves the full layer weights blob. save_every and the rest of the nested LayerwiseConfig are kept. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Follow-ups on dropping save_quantizers_only: - Remove the now-dead quantizer_buffers checkpoint path in _save_layer_files (save() always writes weights.pt; full_restore() only reads it). - Rename _int8_layerwise_config -> _int8_cfg_with_algorithm and clarify its docstring: it only overrides the algorithm block; layerwise runs only when the algorithm sets layerwise.enable (a plain {method: ...} is sequential). - Keep both next-layer captures and ckpt.save under persistent_materialization so the capture forward reuses the resident weights instead of re-materializing per batch on offloaded/sharded models. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

coderabbitai

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

modelopt/torch/quantization/utils/layerwise_calib.py (1)
625-673: ⚠️ Potential issue | 🔴 Critical

Do not treat resume checkpoints as implicitly trusted before torch.load(weights_only=False)
_CheckpointState.from_folder() resumes based on detect_resume_point(), and _read_manifest() only checks that checkpoint_dir/manifest.json is valid JSON (no signing/integrity/ownership validation). If checkpoint_dir is pre-populated, setup_resume()/full_restore() call torch.load(..., weights_only=False) on output_meta.pt, next_inputs.pt, quantizer_state.pt, and weights.pt, making checkpoint_dir an RCE boundary; the current inline “internally generated” justification is not sufficient for this trust model.
Switch to safe loading (weights_only=True where possible) and/or require explicit trust opt-in and integrity validation for resume directories (e.g., only resume from directories created by this run or verify an authenticity marker/signature).
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@modelopt/torch/quantization/utils/layerwise_calib.py` around lines 625 - 673,
The resume flow treats arbitrary checkpoint_dir content as trusted and uses
unsafe torch.load(..., weights_only=False); update the resume code paths
(_CheckpointState.from_folder()/setup_resume() and full_restore()) to (1) use
torch.load(..., weights_only=True) for files that contain tensors only
(output_meta.pt, next_inputs.pt, quantizer_state.pt, weights.pt) so
deserialization of arbitrary Python objects is avoided, and (2) add an explicit
opt-in trust check before allowing weights_only=False or any full
deserialization (e.g., a boolean flag on the CheckpointState or a
signature/marker verified by _read_manifest()) and refuse to resume unless the
directory was created by this process or the marker is valid; change the
torch.load calls in setup_resume and full_restore to weights_only=True where
compatible and gate any remaining weights_only=False loads behind the new
explicit trust check and integrity verification.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@modelopt/torch/quantization/utils/layerwise_calib.py`:
- Around line 625-673: The resume flow treats arbitrary checkpoint_dir content
as trusted and uses unsafe torch.load(..., weights_only=False); update the
resume code paths (_CheckpointState.from_folder()/setup_resume() and
full_restore()) to (1) use torch.load(..., weights_only=True) for files that
contain tensors only (output_meta.pt, next_inputs.pt, quantizer_state.pt,
weights.pt) so deserialization of arbitrary Python objects is avoided, and (2)
add an explicit opt-in trust check before allowing weights_only=False or any
full deserialization (e.g., a boolean flag on the CheckpointState or a
signature/marker verified by _read_manifest()) and refuse to resume unless the
directory was created by this process or the marker is valid; change the
torch.load calls in setup_resume and full_restore to weights_only=True where
compatible and gate any remaining weights_only=False loads behind the new
explicit trust check and integrity verification.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5863b507-feac-435e-a513-e6cf2036feae

📥 Commits

Reviewing files that changed from the base of the PR and between 51bc967 and ac9930d.

📒 Files selected for processing (3)

modelopt/torch/quantization/model_calib.py
modelopt/torch/quantization/utils/layerwise_calib.py
tests/unit/torch/quantization/test_layerwise_calibrate.py

🚧 Files skipped from review as they are similar to previous changes (2)

modelopt/torch/quantization/model_calib.py
tests/unit/torch/quantization/test_layerwise_calibrate.py

save_every); this PR adds the two optimizations on top: calib_mutates_weights (skip weight checkpoint + writeback for amax-only algorithms) and meta-device skip-layer placeholders. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com> Signed-off-by: Fridah-nv <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv requested review from a team as code owners May 29, 2026 23:02

Fridah-nv requested a review from cjluo-nv May 29, 2026 23:02

Fridah-nv requested review from realAsma and sugunav14 May 29, 2026 23:03

cjluo-nv reviewed May 29, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py Outdated

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py

Comment thread modelopt/torch/quantization/config.py

coderabbitai Bot reviewed May 29, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/config.py

Comment thread modelopt/torch/quantization/config.py

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py Outdated

claude Bot reviewed May 29, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py

claude Bot reviewed May 29, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/config.py

claude Bot reviewed May 29, 2026

View reviewed changes

Comment thread examples/llm_ptq/example_utils.py

cjluo-nv reviewed May 29, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py Outdated

Comment thread modelopt/torch/quantization/config.py Outdated

coderabbitai Bot approved these changes May 30, 2026

View reviewed changes

cjluo-nv reviewed May 30, 2026

View reviewed changes

claude Bot approved these changes May 30, 2026

View reviewed changes

realAsma reviewed Jun 1, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_calib.py Outdated

cjluo-nv reviewed Jun 1, 2026

View reviewed changes

realAsma mentioned this pull request Jun 1, 2026

Refine layerwise non-mutating calibration #1592

Closed

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

realAsma reviewed Jun 5, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/model_calib.py

cjluo-nv reviewed Jun 5, 2026

View reviewed changes

Comment thread modelopt/torch/quantization/utils/layerwise_calib.py

realAsma reviewed Jun 5, 2026

View reviewed changes

realAsma approved these changes Jun 5, 2026

View reviewed changes

Fridah-nv and others added 8 commits June 5, 2026 23:31

Nest layerwise calibration config; add get_qdq_activations_from_prev_…

e4e1256

…layer Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

Migrate PTQ recipes to the nested layerwise config form

922ec0c

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

Add save_every + save_quantizers_only checkpoint knobs

3d12fbe

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

update changelog

e446627

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

apply reviewer feedbacks

7f77545

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

Capture next-layer inputs before calib_func under qdq_from_prev=False

4851cc9

Signed-off-by: Frida Hou <201670829+Fridah-nv@users.noreply.github.com>

Fridah-nv force-pushed the fridah/layerwise-config branch from ac9930d to ab19be0 Compare June 5, 2026 23:36

coderabbitai Bot reviewed Jun 5, 2026

View reviewed changes

		)


		def test_layerwise_no_qdq_matches_sequential_amax(monkeypatch):

Conversation

Fridah-nv commented May 29, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

1. Nested layerwise config

2. get_qdq_activations_from_prev_layer — correct GPTQ vs max-calib semantics

3. save_every — gate the large activation-cache writes

UPDATE: drop this in current PR and moved to #1640

4. save_quantizers_only — algorithm-aware weight-blob skipping

Usage

Testing

End-to-end correctness verification

Before your PR is "Ready for review"

Additional Information

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Pre-merge checks failed

❌ Failed checks (1 error, 1 warning)

Uh oh!

Fridah-nv commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Built to branch gh-pages at 2026-06-05 23:40 UTC. Preview will be ready when the GitHub Pages deployment is complete.

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Fridah-nv commented May 30, 2026

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv left a comment

Choose a reason for hiding this comment

Uh oh!

realAsma commented Jun 1, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cjluo-nv left a comment

Fridah-nv commented May 29, 2026 •

edited by coderabbitai Bot

Loading

1. Nested `layerwise` config

2. `get_qdq_activations_from_prev_layer` — correct GPTQ vs max-calib semantics

3. `save_every` — gate the large activation-cache writes

4. `save_quantizers_only` — algorithm-aware weight-blob skipping

coderabbitai Bot commented May 29, 2026 •

edited

Loading

github-actions Bot commented May 29, 2026 •

edited

Loading

Built to branch `gh-pages` at 2026-06-05 23:40 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

codecov Bot commented May 29, 2026 •

edited

Loading