You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.rst
+1Lines changed: 1 addition & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -7,6 +7,7 @@ Changelog
7
7
**New Features**
8
8
9
9
- Add the ``day0-release`` agent skill (``.agents/skills/day0-release/``), a deterministic end-to-end driver that chains the PTQ → evaluation → comparison skills (the evaluation stage deploys the checkpoint itself) with an enforced gate after each stage and returns a publish decision (ACCEPT / REGRESSION / ANOMALOUS / INFEASIBLE). Ships three GPU-free, unit-tested gate scripts (``gate_ptq.py``, ``gate_run.py``, ``gate_compare.py``) that validate checkpoint coverage, evaluation-run completeness, and baseline-vs-candidate accuracy threshold. v1 reports and stops on regression; the recipe-search loop is deferred.
10
+
- Add **streaming** speculative-decoding training (EAGLE3 / DFlash): the draft trains on base-model hidden states produced on the fly by a co-located ``vllm serve`` (no disk dump), moved trainer-side over NIXL RDMA, scaling to multi-node (dedicated serve replicas + DDP trainers). New launcher examples for NVFP4 Kimi-K2.5 / K2.6 on GB200/aarch64 under ``tools/launcher/examples/moonshotai/``.
Copy file name to clipboardExpand all lines: examples/speculative_decoding/README.md
+6Lines changed: 6 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -18,6 +18,7 @@ This example focuses on training with Hugging Face. To train with Megatron‑LM,
18
18
| Simplified Workflow | Train, evaluate, and export EAGLE model with one-line command |\[[Link](#getting-started-simplified-workflow)\]|
19
19
| Online Training | Train draft model alongside base model in GPU memory |\[[Link](#training-draft-model-with-online-base-model)\]|
20
20
| Offline Training | Train draft model using pre-computed hidden states |\[[Link](#training-draft-model-with-offline-base-model)\]|
21
+
| Streaming Training | Train draft on hidden states streamed from a live vLLM serve (no disk dump) |\[[Link](#training-draft-model-with-streaming-base-model)\]|
21
22
| After Training | Evaluation, export and deployment |\[[Link](#model-validation)\]|
22
23
| Advanced Usage | Data synthesis, vocab compression, and configuration |\[[Link](#advanced-usage)\]|
23
24
| Support Matrix | Supported models for speculative decoding training |\[[Link](#support-matrix)\]|
@@ -127,6 +128,10 @@ Once we finish dumping hidden states, launch offline training pointing to the hi
127
128
training.output_dir=ckpts/llama-3.2-1b-offline
128
129
```
129
130
131
+
## Training Draft Model with Streaming Base Model
132
+
133
+
For large base models, you can stream hidden states from a live `vllm serve` instead of dumping them to disk: a co-located server produces the base-model hidden states on the fly and sends them to the trainer over NIXL RDMA, scaling to multiple nodes (dedicated serve replicas + DDP trainers). See the launcher examples, e.g. [Kimi-K2.5 streaming EAGLE3](../../tools/launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3_multi_node.yaml) and [streaming DFlash](../../tools/launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash_multi_node.yaml).
134
+
130
135
## Model Validation
131
136
132
137
For online training checkpoints, we can run in-framework evaluation on MT-bench:
@@ -334,6 +339,7 @@ See `main.py` for the full example including tokenizer setup, dataset loading, a
0 commit comments