Skip to content

Commit 0cfd3f5

Browse files
yeyu-nvidiaclaude
andcommitted
Address review on 1429: skill frontmatter, TP-formula consistency, docs
- eagle3-new-model: add user_invocable:true (match sibling eagle3 skills); fix internally-inconsistent GPU-sizing table (tp is fixed at 4, full-node sharding; gpus_to_fit only sizes node count) and use consistent example rows; update vLLM backend note to native extractor (no speculators). - eagle3-review-logs: correct "in parallel" wording (the loop is sequential). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Ye Yu <yeyu@nvidia.com>
1 parent 7ccae94 commit 0cfd3f5

2 files changed

Lines changed: 13 additions & 9 deletions

File tree

.claude/skills/eagle3-new-model/SKILL.md

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ description: >
66
backend (TRT-LLM / HF / vLLM) and GPU configuration.
77
Use when user wants to run EAGLE3 on a model that does not yet have a YAML in
88
tools/launcher/examples/ or asks how to configure the pipeline for a new checkpoint.
9+
user_invocable: true
910
---
1011

1112
# EAGLE3 New Model Configuration
@@ -31,24 +32,27 @@ Determine these values from the HuggingFace model card, `config.json`, and vLLM
3132
OCI-HSG nodes: **4 GPUs × 192 GB HBM3e = 768 GB per node**
3233

3334
```text
34-
BF16 weight size = total_params × 2 bytes
35-
GPUs needed = ceil(weight_size_GB / 192)
36-
nodes = ceil(gpus_needed / 4)
37-
tp = min(gpus_needed, 4)
35+
weight_size_GB = total_params × 2 bytes # BF16
36+
gpus_to_fit = ceil(weight_size_GB / 192) # min GPUs to hold weights (192 GB each)
37+
nodes = ceil(gpus_to_fit / 4) # 4 GPUs per node
38+
tp = 4 # shard across all 4 GPUs on each node
3839
```
3940

40-
| Model | Weights (BF16) | GPUs | nodes | tp |
41+
`tp` is fixed at 4: jobs allocate whole nodes and shard across the node's 4 GPUs, with
42+
data parallelism across nodes (SLURM array tasks). `gpus_to_fit` is only used to size `nodes`.
43+
44+
| Model | Weights (BF16) | GPUs to fit | nodes | tp |
4145
|---|---|---|---|---|
4246
| 8B dense | ~16 GB | 1 | 1 | 4 |
4347
| 70B dense | ~140 GB | 1 | 1 | 4 |
44-
| 685B MoE | ~340 GB | 2 | 1 | 4 |
45-
| 1T MoE | ~595 GB | 4 | 1 | 4 |
48+
| 405B dense | ~810 GB | 5 | 2 | 4 |
49+
| 671B MoE | ~1.3 TB | 7 | 2 | 4 |
4650

4751
## Step 3 — Choose the hidden state dump backend
4852

4953
| Backend | Script | When to use |
5054
|---------|--------|-------------|
51-
| vLLM | `common/eagle3/dump_offline_data_vllm.sh` | Default; broad coverage via vLLM + speculators |
55+
| vLLM | `common/eagle3/dump_offline_data_vllm.sh` | Default; broad coverage via vLLM's native hidden-state extractor |
5256
| HF | `common/eagle3/dump_offline_data_hf.sh` | VLMs, custom-code models, SWA attention |
5357
| TRT-LLM | `common/eagle3/dump_offline_data.sh` | Pure-text models with TRT-LLM support (needs `--tp`/`--moe-ep`) |
5458

.claude/skills/eagle3-review-logs/SKILL.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -31,7 +31,7 @@ Do this in a single Bash call. If no experiments exist, ask the user for the dir
3131

3232
## Step 1 — Read all task logs
3333

34-
Read the last 200 lines of each log in parallel. Errors appear at the end:
34+
Read the last 200 lines of each log in a single Bash call. Errors appear at the end:
3535

3636
```bash
3737
for f in $(find experiments/<exp_id>/ -name "sbatch_*.out" | sort); do

0 commit comments

Comments
 (0)