-
Notifications
You must be signed in to change notification settings - Fork 400
[OMNIML-4668] hidden_state_dump_support #1478
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,40 @@ | ||||||||||||
| # EAGLE3 offline speculative decoding pipeline for Qwen3-8B. | ||||||||||||
| # | ||||||||||||
| # 4-step pipeline: | ||||||||||||
| # task_0: Data synthesis — query TRT-LLM server to generate prompt samples | ||||||||||||
| # task_1: Dump hidden states — run target model to capture hidden states | ||||||||||||
| # task_2: Offline training — train the EAGLE3 draft head | ||||||||||||
| # task_3: Benchmark — evaluate speculative decoding speedup via VLLM | ||||||||||||
| # | ||||||||||||
| # All tasks share /scratchspace to pass artifacts between steps. | ||||||||||||
| # | ||||||||||||
| # Usage: | ||||||||||||
| # uv run launch.py --yaml examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes | ||||||||||||
| # uv run slurm.py --yaml modules/Model-Optimizer/tools/launcher/examples/Qwen/Qwen3-8B/hf_offline_eagle3.yaml --yes | ||||||||||||
|
|
||||||||||||
| job_name: Qwen3-8B_EAGLE3_hidden_dump | ||||||||||||
| pipeline: | ||||||||||||
| allow_to_fail: false | ||||||||||||
| skip: false | ||||||||||||
| note: | ||||||||||||
|
|
||||||||||||
| global_vars: | ||||||||||||
| hf_model: /hf-local/Qwen/Qwen3-8B | ||||||||||||
|
|
||||||||||||
| # Step 2: Dump hidden states from target model | ||||||||||||
| task_0: | ||||||||||||
| script: common/eagle3/dump_offline_data.sh | ||||||||||||
| args: | ||||||||||||
| - --input-data /scratchspace/data | ||||||||||||
| - --output-dir /scratchspace/offline_hidden_states | ||||||||||||
| - --max-seq-len 8192 | ||||||||||||
| - --tp 8 | ||||||||||||
| - --moe-ep 8 | ||||||||||||
| environment: | ||||||||||||
| - HF_MODEL_CKPT: <<global_vars.hf_model>> | ||||||||||||
|
Comment on lines
+33
to
+34
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add required According to the coding guidelines, when adding a new model config, the As per coding guidelines: "Set 🔧 Proposed fix to add MLM_MODEL_CFG environment:
- HF_MODEL_CKPT: <<global_vars.hf_model>>
+ - MLM_MODEL_CFG: Qwen/Qwen3-8B📝 Committable suggestion
Suggested change
🤖 Prompt for AI Agents |
||||||||||||
| slurm_config: | ||||||||||||
| _factory_: "slurm_factory" | ||||||||||||
| nodes: 1 | ||||||||||||
| ntasks_per_node: 8 | ||||||||||||
| gpus_per_node: 8 | ||||||||||||
| container: nvcr.io/nvidia/tensorrt-llm/release:1.2.0 | ||||||||||||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix documentation inconsistencies in the comment header.
The comment header has several issues:
Lines 3-7: Describe a full 4-step pipeline with tasks 0-3, but this file only defines
task_0(the hidden state dump step). This is misleading since the file is a stage artifact containing a single step, not the complete pipeline.Lines 12-13: The usage examples reference
hf_offline_eagle3.yaml, but the actual filename isstep2_hidden.yaml.Line 13: Contains an unusual path prefix
modules/Model-Optimizer/that doesn't match the expected path structure.Update the comment header to accurately reflect that this is a standalone stage file for step 2 (hidden state dump), and correct the filename references.
📝 Proposed fix for documentation
📝 Committable suggestion
🤖 Prompt for AI Agents