Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
# EAGLE3 offline speculative decoding pipeline for Qwen3-8B.
#
# 4-step pipeline:
# task_0: Data synthesis — query TRT-LLM server to generate prompt samples
# task_1: Dump hidden states — run target model to capture hidden states
# task_2: Offline training — train the EAGLE3 draft head
# task_3: Benchmark — evaluate speculative decoding speedup via VLLM
#
# All tasks share /scratchspace to pass artifacts between steps.
#
# Usage:
# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
# uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes

Comment on lines +1 to +14
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix documentation inconsistencies in the comment header.

The comment header has several issues:

  1. Lines 3-7: Describe a full 4-step pipeline with tasks 0-3, but this file only defines task_0 (the hidden state dump step). This is misleading since the file is a stage artifact containing a single step, not the complete pipeline.

  2. Lines 12-13: The usage examples reference hf_offline_eagle3.yaml, but the actual filename is step2_hidden.yaml.

  3. Line 13: Contains an unusual path prefix modules/Model-Optimizer/ that doesn't match the expected path structure.

Update the comment header to accurately reflect that this is a standalone stage file for step 2 (hidden state dump), and correct the filename references.

📝 Proposed fix for documentation
-# EAGLE3 offline speculative decoding pipeline for Qwen3-8B.
+# EAGLE3 hidden state dump (Step 2) for Qwen3-8B.
 #
-# 4-step pipeline:
-#   task_0: Data synthesis — query TRT-LLM server to generate prompt samples
-#   task_1: Dump hidden states — run target model to capture hidden states
-#   task_2: Offline training — train the EAGLE3 draft head
-#   task_3: Benchmark — evaluate speculative decoding speedup via VLLM
-#
-# All tasks share /scratchspace to pass artifacts between steps.
+# This stage runs the target model to capture hidden states for EAGLE3 training.
+# Expects input data in /scratchspace/data and outputs to /scratchspace/offline_hidden_states.
 #
 # Usage:
-#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
-#   uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
+#   uv run launch.py --yaml examples/Qwen/Qwen3-8B/step2_hidden.yaml --yes
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# EAGLE3 offline speculative decoding pipeline for Qwen3-8B.
#
# 4-step pipeline:
# task_0: Data synthesis — query TRT-LLM server to generate prompt samples
# task_1: Dump hidden states — run target model to capture hidden states
# task_2: Offline training — train the EAGLE3 draft head
# task_3: Benchmark — evaluate speculative decoding speedup via VLLM
#
# All tasks share /scratchspace to pass artifacts between steps.
#
# Usage:
# uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
# uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes
# EAGLE3 hidden state dump (Step 2) for Qwen3-8B.
#
# This stage runs the target model to capture hidden states for EAGLE3 training.
# Expects input data in /scratchspace/data and outputs to /scratchspace/offline_hidden_states.
#
# Usage:
# uv run launch.py --yaml examples/Qwen/Qwen3-8B/step2_hidden.yaml --yes
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml` around lines 1 - 14,
Update the comment header to reflect that this file is a standalone stage for
"step 2: hidden state dump" (not the full 4-step pipeline): remove or trim the
pipeline task list to describe only the hidden-state dump stage (task_0), change
both usage examples to reference step2_hidden.yaml instead of
hf_offline_eagle3.yaml, and remove the incorrect path prefix
"modules/Model-Optimizer/" so the slurm example mirrors the correct repository
path structure; keep the description concise and clearly state that this file
contains only the hidden-state dump step.

job_name: Qwen3-8B_EAGLE3_hidden_dump
pipeline:
allow_to_fail: false
skip: false
note:

global_vars:
hf_model: /hf-local/Qwen/Qwen3-8B

# Step 2: Dump hidden states from target model
task_0:
script: common/eagle3/dump_offline_data.sh
args:
- --input-data /scratchspace/data
- --output-dir /scratchspace/offline_hidden_states
- --max-seq-len 8192
- --tp 8
- --moe-ep 8
environment:
- HF_MODEL_CKPT: <<global_vars.hf_model>>
Comment on lines +33 to +34
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add required MLM_MODEL_CFG environment variable.

According to the coding guidelines, when adding a new model config, the MLM_MODEL_CFG environment variable must be set to the HuggingFace repo ID. This is currently missing from the environment configuration.

As per coding guidelines: "Set MLM_MODEL_CFG environment variable to the HuggingFace repo ID when adding a new model config"

🔧 Proposed fix to add MLM_MODEL_CFG
     environment:
       - HF_MODEL_CKPT: <<global_vars.hf_model>>
+      - MLM_MODEL_CFG: Qwen/Qwen3-8B
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
environment:
- HF_MODEL_CKPT: <<global_vars.hf_model>>
environment:
- HF_MODEL_CKPT: <<global_vars.hf_model>>
- MLM_MODEL_CFG: Qwen/Qwen3-8B
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/launcher/examples/Qwen/Qwen3-8B/step2_hidden.yaml` around lines 33 -
34, Add the missing MLM_MODEL_CFG environment variable in the same environment
block as HF_MODEL_CKPT: set MLM_MODEL_CFG to the HuggingFace repo ID (use the
same placeholder <<global_vars.hf_model>> used by HF_MODEL_CKPT) so the
environment contains both HF_MODEL_CKPT and MLM_MODEL_CFG entries.

slurm_config:
_factory_: "slurm_factory"
nodes: 1
ntasks_per_node: 8
gpus_per_node: 8
container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0
Loading