[OMNIML-4668] hidden_state_dump_support#1478
Conversation
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds a new YAML pipeline configuration for Qwen3-8B EAGLE3 that defines a hidden-state dump job running ChangesQwen3-8B EAGLE3 Hidden Dump Configuration
🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 6✅ Passed checks (6 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml (1)
24-25: 💤 Low valueConsider clarifying the task numbering.
The comment says "Step 2" while the task is named "task_0". Since this is a stage file that contains only the hidden state dump step, the
task_0naming is correct within this file's scope. However, consider updating the comment to avoid confusion:- # Step 2: Dump hidden states from target model + # Dump hidden states from target model task_0:🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml` around lines 24 - 25, Summary: The top comment "Step 2" conflicts with the local task name task_0 and can confuse readers; update the comment to clearly reflect this file only contains the hidden-state dump. Fix: edit the header comment (the string "Step 2") to a neutral/descriptive label such as "Dump hidden states (task_0)" or "Step: Dump hidden states" so it matches the actual task name task_0 and removes the apparent numbering mismatch.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml`:
- Around line 33-34: Add the missing MLM_MODEL_CFG environment variable in the
same environment block as HF_MODEL_CKPT: set MLM_MODEL_CFG to the HuggingFace
repo ID (use the same placeholder <<global_vars.hf_model>> used by
HF_MODEL_CKPT) so the environment contains both HF_MODEL_CKPT and MLM_MODEL_CFG
entries.
- Around line 1-14: Update the comment header to reflect that this file is a
standalone stage for "step 2: hidden state dump" (not the full 4-step pipeline):
remove or trim the pipeline task list to describe only the hidden-state dump
stage (task_0), change both usage examples to reference step2_hidden.yaml
instead of hf_offline_eagle3.yaml, and remove the incorrect path prefix
"modules/Model-Optimizer/" so the slurm example mirrors the correct repository
path structure; keep the description concise and clearly state that this file
contains only the hidden-state dump step.
---
Nitpick comments:
In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml`:
- Around line 24-25: Summary: The top comment "Step 2" conflicts with the local
task name task_0 and can confuse readers; update the comment to clearly reflect
this file only contains the hidden-state dump. Fix: edit the header comment (the
string "Step 2") to a neutral/descriptive label such as "Dump hidden states
(task_0)" or "Step: Dump hidden states" so it matches the actual task name
task_0 and removes the apparent numbering mismatch.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 63e37675-7a66-452f-895e-e6429fa09cbe
📒 Files selected for processing (1)
tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml
| # EAGLE3 offline speculative decoding pipeline for Qwen3-8B. | ||
| # | ||
| # 4-step pipeline: | ||
| # task_0: Data synthesis — query TRT-LLM server to generate prompt samples | ||
| # task_1: Dump hidden states — run target model to capture hidden states | ||
| # task_2: Offline training — train the EAGLE3 draft head | ||
| # task_3: Benchmark — evaluate speculative decoding speedup via VLLM | ||
| # | ||
| # All tasks share /scratchspace to pass artifacts between steps. | ||
| # | ||
| # Usage: | ||
| # uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes | ||
| # uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes | ||
|
|
There was a problem hiding this comment.
Fix documentation inconsistencies in the comment header.
The comment header has several issues:
-
Lines 3-7: Describe a full 4-step pipeline with tasks 0-3, but this file only defines
task_0(the hidden state dump step). This is misleading since the file is a stage artifact containing a single step, not the complete pipeline. -
Lines 12-13: The usage examples reference
hf_offline_eagle3.yaml, but the actual filename isstep2_hidden.yaml. -
Line 13: Contains an unusual path prefix
modules/Model-Optimizer/that doesn't match the expected path structure.
Update the comment header to accurately reflect that this is a standalone stage file for step 2 (hidden state dump), and correct the filename references.
📝 Proposed fix for documentation
-# EAGLE3 offline speculative decoding pipeline for Qwen3-8B.
+# EAGLE3 hidden state dump (Step 2) for Qwen3-8B.
#
-# 4-step pipeline:
-# task_0: Data synthesis — query TRT-LLM server to generate prompt samples
-# task_1: Dump hidden states — run target model to capture hidden states
-# task_2: Offline training — train the EAGLE3 draft head
-# task_3: Benchmark — evaluate speculative decoding speedup via VLLM
-#
-# All tasks share /scratchspace to pass artifacts between steps.
+# This stage runs the target model to capture hidden states for EAGLE3 training.
+# Expects input data in /scratchspace/data and outputs to /scratchspace/offline_hidden_states.
#
# Usage:
-# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
-# uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
+# uv run launch.py --yaml examples/Qwen/Qwen3-8B/step2_hidden.yaml --yes📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| # EAGLE3 offline speculative decoding pipeline for Qwen3-8B. | |
| # | |
| # 4-step pipeline: | |
| # task_0: Data synthesis — query TRT-LLM server to generate prompt samples | |
| # task_1: Dump hidden states — run target model to capture hidden states | |
| # task_2: Offline training — train the EAGLE3 draft head | |
| # task_3: Benchmark — evaluate speculative decoding speedup via VLLM | |
| # | |
| # All tasks share /scratchspace to pass artifacts between steps. | |
| # | |
| # Usage: | |
| # uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes | |
| # uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes | |
| # EAGLE3 hidden state dump (Step 2) for Qwen3-8B. | |
| # | |
| # This stage runs the target model to capture hidden states for EAGLE3 training. | |
| # Expects input data in /scratchspace/data and outputs to /scratchspace/offline_hidden_states. | |
| # | |
| # Usage: | |
| # uv run launch.py --yaml examples/Qwen/Qwen3-8B/step2_hidden.yaml --yes | |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml` around lines 1 - 14,
Update the comment header to reflect that this file is a standalone stage for
"step 2: hidden state dump" (not the full 4-step pipeline): remove or trim the
pipeline task list to describe only the hidden-state dump stage (task_0), change
both usage examples to reference step2_hidden.yaml instead of
hf_offline_eagle3.yaml, and remove the incorrect path prefix
"modules/Model-Optimizer/" so the slurm example mirrors the correct repository
path structure; keep the description concise and clearly state that this file
contains only the hidden-state dump step.
| environment: | ||
| - HF_MODEL_CKPT: <<global_vars.hf_model>> |
There was a problem hiding this comment.
Add required MLM_MODEL_CFG environment variable.
According to the coding guidelines, when adding a new model config, the MLM_MODEL_CFG environment variable must be set to the HuggingFace repo ID. This is currently missing from the environment configuration.
As per coding guidelines: "Set MLM_MODEL_CFG environment variable to the HuggingFace repo ID when adding a new model config"
🔧 Proposed fix to add MLM_MODEL_CFG
environment:
- HF_MODEL_CKPT: <<global_vars.hf_model>>
+ - MLM_MODEL_CFG: Qwen/Qwen3-8B📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| environment: | |
| - HF_MODEL_CKPT: <<global_vars.hf_model>> | |
| environment: | |
| - HF_MODEL_CKPT: <<global_vars.hf_model>> | |
| - MLM_MODEL_CFG: Qwen/Qwen3-8B |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml` around lines 33 -
34, Add the missing MLM_MODEL_CFG environment variable in the same environment
block as HF_MODEL_CKPT: set MLM_MODEL_CFG to the HuggingFace repo ID (use the
same placeholder <<global_vars.hf_model>> used by HF_MODEL_CKPT) so the
environment contains both HF_MODEL_CKPT and MLM_MODEL_CFG entries.
Agent-authored via pensieve-intern's hidden_state_dump_support stage on Epic OMNIML-4666. Faithful extraction of task_1 (dump_offline_data.sh — hidden-state capture from target model) from hf_offline_eagle3.yaml's monolithic pipeline, renamed task_0 for the standalone step convention. Signed-off-by: Chenhan D. Yu <chenhany@nvidia.com> [ci-retrigger] previous run had a cancelled matrix variant (runner pre-emption, not a real test failure)
cb89372 to
01d2518
Compare
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1478 +/- ##
==========================================
- Coverage 76.78% 76.78% -0.01%
==========================================
Files 473 473
Lines 51413 51413
==========================================
- Hits 39476 39475 -1
- Misses 11937 11938 +1
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
01d2518 to
505f20e
Compare
Draft PR opened by pensieve-intern for OMNIML-4668.
Stage
hidden_state_dump_supportof EpicOMNIML-4666. The agent ran from the SPEC on the ticket description; review every change before marking ready.Always-draft is enforced — the bot never auto-merges.
Summary by CodeRabbit