Add Alpamayo-1 example#1594
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (3)
✅ Files skipped from review due to trivial changes (2)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughIntroduces a new Alpamayo-R1 quantization example with a CLI script, calibration pipeline, and both PTQ (FP8/NVFP4) and AutoQuantize modes. Includes a teacher-forced flow-matching forward pass for calibration, parquet clip loading, and optional real-quantization compression before HF checkpoint export. ChangesAlpamayo Quantization Example
Sequence DiagramsequenceDiagram
participant User as User/CLI
participant Model as AlpamayoR1 Model
participant Processor as Qwen3-VL Processor
participant CalibLoader as Calibration Loader
participant Quantizer as PTQ/AutoQuantize Engine
participant Compressor as mtq.compress
participant Checkpoint as HF Checkpoint
User->>Model: Load model
User->>Processor: Initialize with pixel bounds
alt PTQ Path (FP8/NVFP4)
User->>CalibLoader: Iterate parquet clips
CalibLoader->>Processor: Tokenize image+trajectory prompt
CalibLoader->>Model: Run VLM rollout (FP16 autocast)
Quantizer->>Model: Apply quantization config (disable vision tower, conditionally disable Linear layers)
Quantizer->>Model: Run mtq.quantize with calibration forward
else AutoQuantize Path
User->>CalibLoader: Create re-iterable loader over clips
CalibLoader->>Processor: Tokenize clips
CalibLoader->>Model: Compute teacher-forced flow loss (BF16 autocast)
Quantizer->>Model: Run mtq.auto_quantize (MSE search under bits budget)
end
alt Real-Quant Mode
User->>Compressor: Run mtq.compress (pack weights)
end
User->>Checkpoint: Save model/processor/config via save_pretrained
User->>Checkpoint: Write hf_quant_config.json
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Suggested reviewers
🚥 Pre-merge checks | ✅ 5 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 3
🧹 Nitpick comments (1)
examples/alpamayo/quantize.py (1)
655-681: ⚡ Quick winGuard checkpoint writes to rank 0 for distributed-safe execution.
Current save/write operations are unconditional; in multi-rank runs, each rank may write the same files concurrently.
🛡️ Proposed fix
+ is_rank0 = ( + not torch.distributed.is_available() + or not torch.distributed.is_initialized() + or torch.distributed.get_rank() == 0 + ) @@ - os.makedirs(args.output_dir, exist_ok=True) - print(f"Saving quantized checkpoint to {args.output_dir!r} ...") + if is_rank0: + os.makedirs(args.output_dir, exist_ok=True) + print(f"Saving quantized checkpoint to {args.output_dir!r} ...") @@ - with torch.inference_mode(): - model.save_pretrained(args.output_dir) - processor.save_pretrained(args.output_dir) - model.config.save_pretrained(args.output_dir) + if is_rank0: + with torch.inference_mode(): + model.save_pretrained(args.output_dir) + processor.save_pretrained(args.output_dir) + model.config.save_pretrained(args.output_dir) @@ - with torch.inference_mode(): - model.save_pretrained(args.output_dir) - - processor.save_pretrained(args.output_dir) - model.config.save_pretrained(args.output_dir) - - quant_cfg = get_quant_config(model) - with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f: - json.dump(quant_cfg, f) + if is_rank0: + with torch.inference_mode(): + model.save_pretrained(args.output_dir) + processor.save_pretrained(args.output_dir) + model.config.save_pretrained(args.output_dir) + quant_cfg = get_quant_config(model) + with open(os.path.join(args.output_dir, "hf_quant_config.json"), "w") as f: + json.dump(quant_cfg, f) + + if torch.distributed.is_available() and torch.distributed.is_initialized(): + torch.distributed.barrier()As per coding guidelines "Develop with distributed processing in mind ... guarding shared side effects such as file writes ... against race conditions between ranks."
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 655 - 681, The saves under both the real_quant and else branches (mtq.compress + model.save_pretrained, processor.save_pretrained, model.config.save_pretrained, and writing hf_quant_config.json from get_quant_config) must be guarded so only the main process/rank 0 performs file writes in distributed runs; detect distributed mode via torch.distributed.is_initialized() and torch.distributed.get_rank() (or the project’s existing is_main_process helper) and wrap all save_pretrained calls and the json file write in a conditional that only runs on rank 0 to avoid concurrent writes to args.output_dir.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/alpamayo/quantize.py`:
- Around line 346-355: The function read_clip_ids_from_parquet currently indexes
cols_lower["key"] which can KeyError; update it to search cols_lower for common
column names (e.g., "key", "clip_id", "clipid", "id", "filename") and use the
first match if present, otherwise fall back to using df.index.astype(str). After
obtaining the list, convert to strings and deduplicate while preserving first
occurrence order (use a seen set/ordered iteration); reference
read_clip_ids_from_parquet and the cols_lower variable to locate the change.
- Around line 27-55: Add an explicit __all__ list at module top to define the
public API; create a single-line assignment __all__ = [...] after the imports
that enumerates the module's public helpers and re-exports (e.g.,
"load_physical_aiavdataset", "AlpamayoR1", "to_special_token",
"create_forward_loop", "get_dataset_dataloader", "export_hf_checkpoint",
"get_quant_config", "mtq", "patch_pretrained_methods", "logger" — adjust to the
actual names this module exposes) and keep it updated when symbols change so
star-import behavior is explicit.
- Around line 500-512: The prints in the hot loop (inside forward_step's
tensor-stat print and loss_func) call .item() on tensors every iteration,
causing unwanted CUDA synchronizations; wrap these debug prints so they only run
when args.debug (and optionally only on the main rank) is true, and compute
.item() values only inside that conditional; update the debug logging in
forward_step (the v_pred/v_target print) and in loss_func (loss print) to be
gated by args.debug and a rank check (e.g., only on rank 0 or when not
distributed) so normal runs avoid per-batch .item() calls and GPU syncs.
---
Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Around line 655-681: The saves under both the real_quant and else branches
(mtq.compress + model.save_pretrained, processor.save_pretrained,
model.config.save_pretrained, and writing hf_quant_config.json from
get_quant_config) must be guarded so only the main process/rank 0 performs file
writes in distributed runs; detect distributed mode via
torch.distributed.is_initialized() and torch.distributed.get_rank() (or the
project’s existing is_main_process helper) and wrap all save_pretrained calls
and the json file write in a conditional that only runs on rank 0 to avoid
concurrent writes to args.output_dir.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 81978cc3-c66c-4b24-8be8-fe303abf7d0d
⛔ Files ignored due to path filters (1)
examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquetis excluded by!**/*.parquet
📒 Files selected for processing (2)
examples/alpamayo/README.mdexamples/alpamayo/quantize.py
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1594 +/- ##
===========================================
+ Coverage 62.63% 77.78% +15.15%
===========================================
Files 481 482 +1
Lines 52893 52961 +68
===========================================
+ Hits 33127 41196 +8069
+ Misses 19766 11765 -8001
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
New ~700-line example. A few real concerns:
-
Likely bug — backslash-escaped special tokens. In
create_message, the f-strings use<\|traj_history_start\|>,<\|traj_history\|>,<\|traj_history_end\|>,<\|cot_start\|>. In Python\|is not a recognized escape sequence, so the literal string contains a backslash before each pipe. Qwen-family special tokens are<|...|>without backslashes — passing the backslash form toapply_chat_templatewill not match the tokenizer's added special tokens and these will tokenize as ordinary text, silently corrupting calibration. Looks like a copy-paste from a markdown-escaped source. (Bonus: these will start emittingSyntaxWarning: invalid escape sequenceon newer Python.) -
Reaches into private API instead of using the public one.
enable_huggingface_checkpointing_patch()re-implementsmto.enable_huggingface_checkpointing()by importing the private_LIBRARY_CLASSES_FOR_PATCHING,_PATCHED_CLASSES, andpatch_pretrained_methodsfrommodelopt.torch.opt.plugins.huggingface, just to skip the_from_configpatch. If the only goal is to skip_from_config, please make that an option on the public API (or document why this divergence is needed) instead of duplicating the loop with private symbols. Also, the file does this at import time as a top-level side effect, which is surprising. -
Monkey-patching
AlpamayoR1.teacher_forced_flow_loss_forwardat module import time is acknowledged in a docstring but it's a strong coupling: the body is a verbatim copy of an internal training-fork method, and any drift innvlabs/alpamayo(e.g.action_in_projsignature,expert_non_causal_attention,action_spaceAPI) will silently produce wrong calibration. At minimum: assert that the publicAlpamayoR1exposes the methods/attrs this body relies on, and pin a known-good Alpamayo SHA in the README setup section instead ofgit cloneofmain. -
read_clip_ids_from_parquetdoesn't match its docstring. The docstring says it "tries common column names; falls back to index if needed", but the implementation hard-codescols_lower["key"]and willKeyErroron any parquet without a literalkeycolumn. Either implement the documented fallback or simplify the docstring. -
Binary parquet checked into the repo. A
.parquetcalibration set is being added toexamples/alpamayo/. The repo doesn't appear to use git-LFS (.gitattributesis missing). Even if this file is small, distributing calibration data through the example tree is unusual for this repo — most examples download from HF on demand. Please confirm the file size is small and that there are no licensing/redistribution restrictions on the AV calibration clips list, or fetch it at runtime instead. -
No tests / no CHANGELOG entry. The PR template's test and changelog checkboxes are still unfilled. New examples in this repo (
vlm_ptq, etc.) typically don't get unit tests, but at least a CHANGELOG entry under "New Features" is expected. -
Minor:
args.parquetdefault is the bare filename, joined toscript_dir; if the user passes an absolute path it works (pathlib handles that), but a plain relative path in the user's CWD will be silently resolved against the script's directory, which is surprising. Consider documenting this or usingPath(args.parquet)as-is when absolute/exists, else fall back toscript_dir / args.parquet.
6cc978e to
0a9c5ca
Compare
|
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
🧹 Nitpick comments (1)
examples/alpamayo/quantize.py (1)
257-257: 💤 Low valueConsider moving the patch call into
main().Executing
patch_teacher_forced_flow_loss_forward()at module import is a side effect that runs even if users only import a helper function. Moving this call intomain()keeps imports side-effect-free and aligns with the principle applied to the HF checkpointing setup.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` at line 257, The top-level call to patch_teacher_forced_flow_loss_forward() causes a side effect at import; remove that call from module scope and invoke patch_teacher_forced_flow_loss_forward() from inside main() (e.g., at the start of the main() function before any training or model code runs) so imports remain side-effect-free; reference the existing patch_teacher_forced_flow_loss_forward function and ensure the patch is applied once (idempotently) before any code that relies on the patched behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@modelopt/torch/opt/plugins/huggingface.py`:
- Around line 171-176: Validate the skip_methods input at the function boundary:
compute the set of valid method names by iterating _LIBRARY_CLASSES_FOR_PATCHING
and collecting the first element of each entry in the patch_methods lists, then
if any name in skip_methods is not in that valid set raise an error (e.g.
ValueError) instead of silently ignoring it; keep the existing logic that
converts skip_methods to a set and then continue to call
patch_pretrained_methods(cls, [m for m in patch_methods if m[0] not in
skip_methods]) for each class, while referencing skip_methods,
_LIBRARY_CLASSES_FOR_PATCHING, patch_pretrained_methods, and _PATCHED_CLASSES to
locate and update the code.
---
Nitpick comments:
In `@examples/alpamayo/quantize.py`:
- Line 257: The top-level call to patch_teacher_forced_flow_loss_forward()
causes a side effect at import; remove that call from module scope and invoke
patch_teacher_forced_flow_loss_forward() from inside main() (e.g., at the start
of the main() function before any training or model code runs) so imports remain
side-effect-free; reference the existing patch_teacher_forced_flow_loss_forward
function and ensure the patch is applied once (idempotently) before any code
that relies on the patched behavior.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: eecdd970-10bf-41e4-a1ca-8f334dd1dacd
⛔ Files ignored due to path filters (1)
examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquetis excluded by!**/*.parquet
📒 Files selected for processing (4)
CHANGELOG.rstexamples/alpamayo/README.mdexamples/alpamayo/quantize.pymodelopt/torch/opt/plugins/huggingface.py
✅ Files skipped from review due to trivial changes (2)
- examples/alpamayo/README.md
- CHANGELOG.rst
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Several critical points from the previous round are addressed (backslash special tokens fixed, private-API duplication replaced with a public skip_methods arg, README now pins an Alpamayo SHA, CHANGELOG entry added, parquet docstring rewritten to match the code), but a few correctness/UX items raised earlier are still open in this commit:
- 💬
forward_step/loss_funcdebugprint(...item()...)calls (raised by CodeRabbit and not addressed) — they still run unconditionally on every AutoQuantize iteration and force a per-batch CPU↔GPU sync.args.debugis even hardcoded toTrueat L573, so there's no way to turn them off without editing the script. - 💬
auto_quantize_modeldisabled_layers(raised by cjluo-nv) still appends bare module names (_name) whilequantize_modelusesf"{_name}.*".auto_quantize'sdisabled_layersare fnmatch-matched against quantizer names likevlm.foo.bar.weight_quantizer, so the bare-name entries here will not match — the not-multiple-of-16 Linears that were the motivation for this exclusion are still being included in the AutoQuantize search. This is a real bug, not a style nit. - 💬 Module-level
patch_teacher_forced_flow_loss_forward()at import (cjluo-nv): ahasattr-on-the-class guard was added, but the request was forhasattrchecks on the attributes the patch body depends on (action_in_proj,expert_non_causal_attention,fuse_traj_tokens,action_space.traj_to_action,vlm.model.rope_deltas, …) so upstream drift fails loudly instead of silently producing wrong calibration. Still not added, and the call is still a top-level side effect. - New (PR-internal) issue:
enable_huggingface_checkpointing(skip_methods=...)silently ignores unknown method names — a typo in{"_from_config"}would turn this escape hatch into a no-op without warning. Worth either validating against the registered method names or at least warning. - No tests / no
example_tests.ymlentry for the new example, and a binary.parquetcalibration set is still committed in-tree (no.gitattributes/ LFS in this repo). The other VLM examples fetch calibration data on demand; please confirm size + redistribution rights, or fetch at runtime.
| Alpamayo using ModelOpt. Quantization calibration runs on a small dataset of 16 | ||
| AV clips (`0417_16rows_train_set_for_calibration_25.10.parquet`). |
There was a problem hiding this comment.
Is this a public dataset? I would prefer not to keep this in our repo.
There was a problem hiding this comment.
The parquet file contains clip IDs (which can be publicly shared), the actual driving clips are on HF in the publicly available data nvidia/PhysicalAI-Autonomous-Vehicles
| MSE between the action expert's predicted velocity field `v_pred` and the | ||
| target `v_target = x_1 - x_0` from a teacher-forced forward pass on the | ||
| calibration clips. Layers the loss is sensitive to keep more bits (FP8); the | ||
| rest go to NVFP4. |
There was a problem hiding this comment.
Is there any inference runtime available for the quantized checkpoint?
There was a problem hiding this comment.
Not that I'm aware of, there are just some PyTorch eval scripts lying around in different repos. I have validated accuracy of the FP8, NVFP4, and AutoQuant (6.5 bpe) checkpoints.
0a9c5ca to
5a83ff3
Compare
There was a problem hiding this comment.
Warning
CodeRabbit couldn't request changes on this pull request because it doesn't have sufficient GitHub permissions.
Please grant CodeRabbit Pull requests: Read and write permission and re-run the review.
Actionable comments posted: 1
♻️ Duplicate comments (1)
examples/alpamayo/quantize.py (1)
457-472:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDrop the per-batch tensor stats from the AutoQuantize hot path.
These
print(...item()...)calls run on every search step. On CUDA they force repeated CPU-GPU syncs and add a lot of stdout overhead. Either remove them from the steady-state loop or gate them behind an explicit debug + rank-0 path.⚡ Proposed fix
def forward_step(runtime_model, data): with torch.autocast("cuda", dtype=torch.bfloat16): - out = runtime_model.teacher_forced_flow_loss_forward(data=data) - v_pred, v_target = out["v_pred"], out["v_target"] - print( - f"[autoquant-fwd] v_pred: finite={torch.isfinite(v_pred).all().item()} " - f"min={v_pred.min().item():.4g} max={v_pred.max().item():.4g} " - f"abs_mean={v_pred.abs().mean().item():.4g} | " - f"v_target: finite={torch.isfinite(v_target).all().item()} " - f"min={v_target.min().item():.4g} max={v_target.max().item():.4g}" - ) - return out + return runtime_model.teacher_forced_flow_loss_forward(data=data) def loss_func(output, batch): - loss = torch.nn.functional.mse_loss(output["v_pred"], output["v_target"]) - print(f"[autoquant-loss] loss={loss.item():.6g} finite={torch.isfinite(loss).item()}") - return loss + return torch.nn.functional.mse_loss(output["v_pred"], output["v_target"])As per coding guidelines "Keep tensor work on the GPU and avoid unnecessary CPU-GPU syncs; use PyTorch tensor ops by default and only extract Python scalars when the CPU needs the value" and "Develop with distributed processing in mind: use
print_rank_0orwarn_rank_0when possible and guard shared side effects against race conditions between ranks".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@examples/alpamayo/quantize.py` around lines 457 - 472, The per-batch print statements in forward_step and loss_func (inside forward_step and loss_func) force CPU-GPU syncs and should be removed from the steady-state hot path; either delete these prints or wrap them behind an explicit debug flag and a rank-0 guard (e.g., if debug and is_rank_0()) so they run only when requested, and avoid calling .item() on GPU tensors inside the loop (use tensor ops or move a small sampled/logging batch to CPU outside the hot path). Ensure any retained logging uses a rank-0-safe helper (print_rank_0/warn_rank_0) and runs rarely (not every step).
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@examples/alpamayo/quantize.py`:
- Around line 355-363: The fallback calibration path hardcodes "cuda:0" when
calling get_dataset_dataloader which causes device-mismatch if the model lives
elsewhere; change it to derive the device from the model (or use the same device
parameter passed into quantize_model) instead of the literal string. Locate the
branch that checks calibration_forward_loop is None and replace the device
argument for get_dataset_dataloader with the model's device (e.g. infer via
model.device or next(model.parameters()).device) or a provided device variable
so calib_dataloader ends up on the same device as the model used by
quantize_model.
---
Duplicate comments:
In `@examples/alpamayo/quantize.py`:
- Around line 457-472: The per-batch print statements in forward_step and
loss_func (inside forward_step and loss_func) force CPU-GPU syncs and should be
removed from the steady-state hot path; either delete these prints or wrap them
behind an explicit debug flag and a rank-0 guard (e.g., if debug and
is_rank_0()) so they run only when requested, and avoid calling .item() on GPU
tensors inside the loop (use tensor ops or move a small sampled/logging batch to
CPU outside the hot path). Ensure any retained logging uses a rank-0-safe helper
(print_rank_0/warn_rank_0) and runs rarely (not every step).
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 6d5962f0-996f-4a23-9264-bc78c251cdb8
⛔ Files ignored due to path filters (1)
examples/alpamayo/0417_16rows_train_set_for_calibration_25.10.parquetis excluded by!**/*.parquet
📒 Files selected for processing (4)
CHANGELOG.rstexamples/alpamayo/README.mdexamples/alpamayo/quantize.pymodelopt/torch/opt/plugins/huggingface.py
✅ Files skipped from review due to trivial changes (2)
- examples/alpamayo/README.md
- CHANGELOG.rst
🚧 Files skipped from review as they are similar to previous changes (1)
- modelopt/torch/opt/plugins/huggingface.py
| if calibration_forward_loop is None: | ||
| if tokenizer is None: | ||
| raise ValueError("tokenizer must be provided when calibration_forward_loop is None") | ||
| calib_dataloader = get_dataset_dataloader( | ||
| tokenizer=tokenizer, | ||
| batch_size=32, | ||
| num_samples=512, | ||
| device="cuda:0", | ||
| ) |
There was a problem hiding this comment.
Don't hardcode cuda:0 in the fallback calibration path.
This helper accepts an arbitrary model, but this branch always builds calibration batches on cuda:0. Reusing quantize_model() with a model on another device will hit a device mismatch before calibration starts.
🐛 Proposed fix
if calibration_forward_loop is None:
if tokenizer is None:
raise ValueError("tokenizer must be provided when calibration_forward_loop is None")
+ calib_device = str(next(model.parameters()).device)
calib_dataloader = get_dataset_dataloader(
tokenizer=tokenizer,
batch_size=32,
num_samples=512,
- device="cuda:0",
+ device=calib_device,
)
calibrate_loop = create_forward_loop(dataloader=calib_dataloader)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@examples/alpamayo/quantize.py` around lines 355 - 363, The fallback
calibration path hardcodes "cuda:0" when calling get_dataset_dataloader which
causes device-mismatch if the model lives elsewhere; change it to derive the
device from the model (or use the same device parameter passed into
quantize_model) instead of the literal string. Locate the branch that checks
calibration_forward_loop is None and replace the device argument for
get_dataset_dataloader with the model's device (e.g. infer via model.device or
next(model.parameters()).device) or a provided device variable so
calib_dataloader ends up on the same device as the model used by quantize_model.
cjluo-nv
left a comment
There was a problem hiding this comment.
Bot review — DM the bot to share feedback.
Most critical issues from earlier rounds are now addressed: the backslash-escape bug in special tokens is fixed, the private-API duplication has been replaced by a real public skip_methods parameter on enable_huggingface_checkpointing, the README pins an Alpamayo SHA (4cda35d), a CHANGELOG entry was added, and the read_clip_ids_from_parquet docstring was rewritten to match the code. A few items still warrant a human reviewer's sign-off:
- 💬 Author marked the per-batch
.item()debug prints inforward_step/loss_func(lines ~462–475) as "Invalid" without rationale. These still run unconditionally and force CPU↔GPU syncs every AutoQuantize step. Impact is bounded (16 clips × num_score_steps), but the rebuttal isn't substantiated — worth a quick human glance to confirm this is intended. - 💬 Author marked the
auto_quantize_modeldisabled_layersbare-name comment as "Invalid". I checked_AutoQuantizeBaseSearcher.insert_hparams_after_merge_rules—disabled_layersare fnmatched againstnamefrommodel.named_modules(), i.e. module names likevlm.foo.bar, not quantizer names. So the bare names here are actually correct and the author's "Invalid" is right; the prior reviewer's note was mistaken. Just flagging so it doesn't get re-litigated. - 💬 Module-level monkey-patch of
AlpamayoR1.teacher_forced_flow_loss_forward(L257): author argues the pinned SHA in the README is sufficient defense against drift. The body still reaches intoaction_in_proj,expert_non_causal_attention,fuse_traj_tokens,action_space.traj_to_action,vlm.model.rope_deltas, etc. with nohasattrguards. Reasonable position given the SHA pin, but worth a human sanity check. - 💬 meenchen asked whether the in-tree
0417_16rows_train_set_for_calibration_25.10.parquetis a public dataset and expressed a preference not to keep it in the repo. Author responded that the clip IDs are publicly shareable (clips themselves are on HF). meenchen hasn't explicitly accepted that resolution; this is the most important remaining sign-off item — the file is binary, no.gitattributes/LFS in the repo, and other VLM examples fetch calibration data on demand rather than committing it. enable_huggingface_checkpointing(skip_methods=...)silently ignores unknown method names (a typo turns the escape hatch into a no-op). Author replied "only used for Alpamayo, won't be a no-op in practice", but this is now a public API surface and a one-lineValueError(or warning) on unknown names would be cheap insurance. Non-blocking.- No tests / no
example_tests.ymlentry. Consistent with how other VLM examples are structured in this repo, so probably fine.
Licensing: new Python files carry the standard NVIDIA Apache-2.0 header; no third-party code copied; no LICENSE/NOTICE changes. The parquet question above is a redistribution / repo-policy concern rather than a license concern.
On the
|
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
5a83ff3 to
7886c59
Compare
Signed-off-by: Rohan Joshi <rohjoshi@nvidia.com>
7886c59 to
bfcb034
Compare
What does this PR do?
Type of change: ? New example
Adds example for Alpamayo-1 quantization with ModelOpt (FP8, NVFP4, AutoQuant)
Usage
Testing
Before your PR is "Ready for review"
Make sure you read and follow Contributor guidelines and your commits are signed (
git commit -s -S).Make sure you read and follow the Security Best Practices (e.g. avoiding hardcoded
trust_remote_code=True,torch.load(..., weights_only=False),pickle, etc.).CONTRIBUTING.md: ✅ / ❌ / N/AAdditional Information
Summary by CodeRabbit
New Features
Documentation