Skip to content

Commit 24abee0

Browse files
committed
changelog+readme
Signed-off-by: h-guo18 <67671475+h-guo18@users.noreply.github.com>
1 parent d46c300 commit 24abee0

2 files changed

Lines changed: 7 additions & 0 deletions

File tree

CHANGELOG.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ Changelog
77
**New Features**
88

99
- Add the ``day0-release`` agent skill (``.agents/skills/day0-release/``), a deterministic end-to-end driver that chains the PTQ → evaluation → comparison skills (the evaluation stage deploys the checkpoint itself) with an enforced gate after each stage and returns a publish decision (ACCEPT / REGRESSION / ANOMALOUS / INFEASIBLE). Ships three GPU-free, unit-tested gate scripts (``gate_ptq.py``, ``gate_run.py``, ``gate_compare.py``) that validate checkpoint coverage, evaluation-run completeness, and baseline-vs-candidate accuracy threshold. v1 reports and stops on regression; the recipe-search loop is deferred.
10+
- Add **streaming** speculative-decoding training (EAGLE3 / DFlash): the draft trains on base-model hidden states produced on the fly by a co-located ``vllm serve`` (no disk dump), moved trainer-side over NIXL RDMA, scaling to multi-node (dedicated serve replicas + DDP trainers). New launcher examples for NVFP4 Kimi-K2.5 / K2.6 on GB200/aarch64 under ``tools/launcher/examples/moonshotai/``.
1011

1112
0.45 (2026-06-xx)
1213
^^^^^^^^^^^^^^^^^

examples/speculative_decoding/README.md

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,7 @@ This example focuses on training with Hugging Face. To train with Megatron‑LM,
1818
| Simplified Workflow | Train, evaluate, and export EAGLE model with one-line command | \[[Link](#getting-started-simplified-workflow)\] |
1919
| Online Training | Train draft model alongside base model in GPU memory | \[[Link](#training-draft-model-with-online-base-model)\] |
2020
| Offline Training | Train draft model using pre-computed hidden states | \[[Link](#training-draft-model-with-offline-base-model)\] |
21+
| Streaming Training | Train draft on hidden states streamed from a live vLLM serve (no disk dump) | \[[Link](#training-draft-model-with-streaming-base-model)\] |
2122
| After Training | Evaluation, export and deployment | \[[Link](#model-validation)\] |
2223
| Advanced Usage | Data synthesis, vocab compression, and configuration | \[[Link](#advanced-usage)\] |
2324
| Support Matrix | Supported models for speculative decoding training | \[[Link](#support-matrix)\] |
@@ -127,6 +128,10 @@ Once we finish dumping hidden states, launch offline training pointing to the hi
127128
training.output_dir=ckpts/llama-3.2-1b-offline
128129
```
129130

131+
## Training Draft Model with Streaming Base Model
132+
133+
For large base models, you can stream hidden states from a live `vllm serve` instead of dumping them to disk: a co-located server produces the base-model hidden states on the fly and sends them to the trainer over NIXL RDMA, scaling to multiple nodes (dedicated serve replicas + DDP trainers). See the launcher examples, e.g. [Kimi-K2.5 streaming EAGLE3](../../tools/launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_eagle3_multi_node.yaml) and [streaming DFlash](../../tools/launcher/examples/moonshotai/Kimi-K2.5/hf_streaming_dflash_multi_node.yaml).
134+
130135
## Model Validation
131136

132137
For online training checkpoints, we can run in-framework evaluation on MT-bench:
@@ -334,6 +339,7 @@ See `main.py` for the full example including tokenizer setup, dataset loading, a
334339
| Mistral ||||
335340
| Phi 3 ||||
336341
| QWen 1.5,2,2.5,3 ||||
342+
| Kimi-K2.5, K2.6 | | ||
337343

338344
## Speculation Module Checkpoints
339345

0 commit comments

Comments
 (0)