Skip to content

feat(tools/mcp): add dry_run flag to submit_job #1206

feat(tools/mcp): add dry_run flag to submit_job

feat(tools/mcp): add dry_run flag to submit_job #1206

Workflow file for this run

name: Claude Code Review
on:
issue_comment:
types: [created]
jobs:
# ──────────────────────────────────────────────────────────────────
# Deep ModelOpt-focused review covering numerical correctness,
# mode/state composition, export compatibility, and backward
# compatibility for saved checkpoints / recipes.
#
# CodeRabbit (.coderabbit.yaml) auto-reviews every PR for routine
# bugs, typos, style, and security anti-patterns — this Claude job
# is on-demand and complements that with deeper architectural
# analysis. Trigger: /claude review
# ──────────────────────────────────────────────────────────────────
review:
name: Claude Review
if: |
github.event_name == 'issue_comment' &&
github.event.issue.pull_request &&
contains(github.event.comment.body, '/claude review') &&
contains(fromJson('["OWNER", "MEMBER", "COLLABORATOR"]'), github.event.comment.author_association)
runs-on: ubuntu-latest
timeout-minutes: 30
permissions:
contents: read
pull-requests: write
issues: write
id-token: write
env:
GH_TOKEN: ${{ github.token }}
REPO: ${{ github.repository }}
PR_NUMBER: ${{ github.event.issue.number }}
steps:
- name: Get PR info
id: pr-info
run: |
PR_DATA=$(gh pr view $PR_NUMBER --repo $REPO --json headRefOid,baseRefName)
echo "sha=$(echo $PR_DATA | jq -r .headRefOid)" >> $GITHUB_OUTPUT
echo "base_ref=$(echo $PR_DATA | jq -r .baseRefName)" >> $GITHUB_OUTPUT
- name: Checkout repository
uses: actions/checkout@v6
with:
fetch-depth: 1
ref: ${{ steps.pr-info.outputs.sha }}
- name: Fetch base branch for diff analysis
run: git fetch origin ${{ steps.pr-info.outputs.base_ref }}
- name: React to trigger comment
run: |
gh api repos/$REPO/issues/comments/${{ github.event.comment.id }}/reactions \
--method POST \
-f content='eyes'
- name: Run Claude Review
uses: anthropics/claude-code-action@v1
env:
ANTHROPIC_BASE_URL: ${{ secrets.ANTHROPIC_BASE_URL }}
# NVIDIA inference proxy (LiteLLM-based) rejects two fields
# the Claude Code SDK sends by default. Set per NVIDIA/OSMO's
# workflow which has hit and solved both issues:
# - `context_management` → disable experimental betas
# - `cache_control.ephemeral.scope` → disable prompt caching
CLAUDE_CODE_DISABLE_EXPERIMENTAL_BETAS: "1"
DISABLE_PROMPT_CACHING: "1"
with:
anthropic_api_key: ${{ secrets.ANTHROPIC_API_KEY }}
trigger_phrase: "/claude review"
show_full_output: true
claude_args: |
--allowedTools "mcp__github_inline_comment__create_inline_comment,Bash(gh pr comment:*),Bash(gh pr diff:*),Bash(gh pr view:*),Bash(gh pr review:*),Bash(git diff:*),Bash(git show:*),Bash(git log:*),Read,Grep,Glob"
--model "${{ vars.CLAUDE_MODEL }}"
prompt: |
REPO: ${{ env.REPO }}
PR NUMBER: ${{ env.PR_NUMBER }}
BASE REF: origin/${{ steps.pr-info.outputs.base_ref }}
Mandatory workflow — never skip or reorder:
1. Read prior Claude activity on the PR so you don't duplicate already-raised
comments and can track which prior issues are now resolved:
`gh pr view $PR_NUMBER --repo $REPO --json comments,reviews`
Treat prior findings as context, not a ceiling — if you spot a genuinely new
issue this round, flag it.
2. Read the PR diff (gh pr diff).
3. Read AGENTS.md and CONTRIBUTING.md (including the Coding standards section)
for project conventions, coding principles, and architecture.
4. For changed files under `modelopt/torch/<sub-package>/`, read the sub-package's
`__init__.py` plus any `mode.py` / `config.py` to understand mode registration
and config schema.
5. Only then perform the review using that context.
You are performing a deep code review on a **NVIDIA Model Optimizer (ModelOpt)** PR.
ModelOpt is NVIDIA's open-source library for model optimization (quantization, pruning,
distillation, sparsity, speculative decoding, NAS, PEFT) targeting PyTorch / ONNX /
HuggingFace / Megatron, with deployment into TRT-LLM / vLLM / SGLang.
Apply general correctness reasoning to whichever ModelOpt sub-package the diff touches —
you already know the algorithms (quantization formulas, distillation losses, N:M sparsity
mask selection, speculative draft-token alignment, etc.). The prompt below covers the
**ModelOpt-specific structural concerns** that you can't infer from the diff alone.
## Division of labor with CodeRabbit
CodeRabbit (`.coderabbit.yaml`) already auto-reviews every PR with the `chill` profile
and runs a hard pre-merge gate on security anti-patterns. **Do NOT duplicate its work.**
Specifically, do NOT comment on:
- Style, formatting, or naming nits (handled by ruff + CodeRabbit)
- Simple typos in code/comments/strings (CodeRabbit catches these)
- The security anti-patterns enumerated in `.coderabbit.yaml`:
`torch.load(weights_only=False)` without justification, `numpy.load(allow_pickle=True)`
without justification, hardcoded `trust_remote_code=True`, `eval`/`exec` on external
input, `# nosec` bypasses, non-permissive new PIP dependencies — these are already
gated. Skip them entirely.
- Generic "consider adding a test" suggestions for trivial changes.
Your value is in things CodeRabbit's pattern-matching cannot do well: tracing dataflow
across multiple files, reasoning about mode/state composition, judging export
compatibility, and catching algorithm-level correctness bugs.
## Review Procedure
**Aim for one pass.** Surface meaningful issues in this review so the author gets
one consolidated set of fixes.
**Cover each changed file across categories.** For each non-trivial changed file,
consider the categories below (Algorithm Correctness, Mode/State, Export, Backward
Compatibility, Performance) before moving on.
**Trace public symbols across files.** For new or modified public symbols
(functions, arguments, config fields, exported names), grep call sites in
`modelopt/`, `tests/`, and `examples/` before commenting. Many bugs here only
surface where the symbol meets its caller — mode registration, export paths,
restore logic.
1. Get PR metadata: `gh pr view $PR_NUMBER --repo $REPO --json title,body,baseRefName,headRefName,files,additions,deletions,changedFiles,author`
2. Get the full diff: `gh pr diff $PR_NUMBER --repo $REPO`
- For large PRs (>50 files), prioritize source code over config/lock/auto-generated files.
3. For each significant changed file, read the full file for surrounding context.
4. Trace the algorithm end-to-end through the diff. Verify the math/logic matches the
intended technique (whatever sub-package it belongs to).
5. For each newly introduced variable/argument/field, verify it has a meaningful runtime
use path — not just declaration/docstring or discard assignment (`_ = new_arg`).
Use Grep to search for usage beyond declaration sites.
6. Post findings as inline comments with severity and category tags.
## Critical Issues (Must Fix)
### Algorithm Correctness
- Verify the implementation matches the intended technique. Apply your knowledge of the
relevant algorithm family (quantization scales/rounding/saturation, distillation loss
composition, sparsity mask selection, pruning importance scoring, speculative draft
acceptance, NAS supernet weight sharing, etc.).
- Watch for silent numerical bugs: missing fp32 upcast in reductions, wrong reduction
dimension, division-by-zero guards, casts that wrap instead of saturate, gradient
flow through stop_gradient boundaries.
- Watch for state corruption across calibration / search / training passes — leftover
statistics from a previous run are a common foot-gun.
### Mode & State Composability (ModelOpt-specific)
- **Mode registration**: New modes must register correctly with `apply_mode()` /
`restore()`, declare their dependencies, and produce a `modelopt_state` entry that
round-trips through save/restore.
- **State dict schema**: Modified `modelopt_state` schema must include a migration path
or version bump — silently changing keys breaks restore for existing checkpoints.
- **Restore fidelity**: After `restore(model, state)`, the model must be functionally
identical to the saved one. Verify module replacements, hooks, and parameters are
re-applied.
- **Plugin laziness**: Optional integrations (HF, Megatron, TRT-LLM, ONNX) must not
hard-import at module load — gate behind `import_plugin()` so users without those
extras don't break.
### Export Compatibility (ModelOpt-specific)
- HF export (`unified_export_hf.py`) must produce a checkpoint that loads cleanly in
transformers and matches the on-device dtype.
- TRT-LLM export (`model_config_export.py`) must emit a valid `config.json` with
correct `quant_algo`, `kv_cache_quant_algo`, scale tensor names, and weight layout.
- ONNX export must use opsets and operator versions supported by the target consumer
(TRT, ORT).
## Important Issues (Should Fix)
### Backward Compatibility
- Renamed or removed arguments / config fields without deprecation path — breaks
existing user scripts.
- `modelopt_recipes/*.yaml` schema changes without a version bump — old recipes
silently misparse.
- Changed defaults silently alter behavior for users relying on them.
- Changed function signatures, return types, or side effects in
`modelopt/torch/*/__init__.py` (public API) without a backward-compat shim.
- Modified `modelopt_state` keys/structure without migration — makes existing
optimized checkpoints unloadable.
### Performance
- Unnecessary CPU-GPU synchronization in hot paths: `.item()`, `.cpu()`,
`torch.cuda.synchronize()`, Python-side tensor value checks.
- Memory regressions: double-allocating weights, holding tensors past their lifetime.
## Suggestions (Nice to Have)
SUGGESTIONs document non-blocking improvements and never block approval (see
Completion below). Raise them when genuinely useful; skip nits that aren't.
- Stale, imprecise, or misleading comments/docstrings — a wrong docstring is worse
than none.
- Missing shape/dtype assertions at module/parallelism boundaries where they would
catch real bugs.
- Functions mixing many unrelated responsibilities that would benefit from splitting.
## Comment Format
Prefix each comment with severity and category tag:
- `**[CRITICAL Algorithm]**`, `**[CRITICAL ModeState]**`, `**[CRITICAL Export]**`
- `**[IMPORTANT Compatibility]**`, `**[IMPORTANT Performance]**`
- `**[SUGGESTION]**`
For each finding, explain: (1) what the issue is, (2) why it matters (impact/risk), (3) specific suggestion for fix.
Only use inline ```suggestion blocks for simple, self-contained line replacements (typos,
renames, single-line fixes). For structural changes that add, remove, or reorganize blocks
of code, use a top-level PR comment with a code block showing the proposed change instead.
## Completion
After posting all inline comments, post a summary PR comment:
- List total findings by severity (CRITICAL: N, IMPORTANT: N, SUGGESTION: N)
- Highlight the most impactful findings
- Overall assessment of the PR's risk level
**Approval decision — use exact counts from your findings (no other thresholds):**
- `0 CRITICAL AND 0 IMPORTANT` → **approve**, regardless of SUGGESTION count.
SUGGESTIONs never block approval.
`gh pr review $PR_NUMBER --repo $REPO --approve --body "Claude review passed — no blocking issues found. LGTM"`
- `≥1 CRITICAL OR ≥1 IMPORTANT` → **post a comment review** summarizing the
issues found.
`gh pr review $PR_NUMBER --repo $REPO --comment --body "<summary of issues found>"`