feat(spec): SpecInput data contract + host/device boundary annotations#1064
Merged
pengchengneo merged 1 commit intoMay 14, 2026
Merged
Conversation
d27b25a to
7a1e0ee
Compare
11 tasks
7a1e0ee to
69113de
Compare
Part of sgl-project#1053 (P1-5a). Defines the spec-decode data contract that sgl-project#1053 P1-2 (BaseSpecWorker/BaseDraftWorker class refactor) will build on, so the abstraction layer has a fixed interface to target. - spec_info.SpecInput: runtime_checkable Protocol exposing the three token counts the RFC requires to be separated (logical / allocated / verify) plus is_draft_input/is_verify_input/get_spec_adjust_token_coefficient /filter_batch/merge_batch. Docstring fixes the host/device boundary and DP-padded layout (Route 1) semantics. - EagleDraftInput: implement SpecInput; per-field docstrings annotate device vs host, shape, JIT-cache-key participation, and the allocate_lens vs accept_length vs new_seq_lens distinction. - EagleVerifyInput: implement SpecInput; per-field docstrings annotate device vs host and static-metadata fields; filter/merge raise (verify input is single-round). - test_spec_info.py: Protocol conformance + three-token-count distinction. No behavior change to the existing dp=1 path (pure additions: docstrings, Protocol, new methods). Type annotations on device fields changed np.ndarray → jax.Array | None to reflect actual runtime types after sgl-project#1063. Depends on sgl-project#1063 (P1-0) — diff is on top of that branch.
69113de to
ba650a3
Compare
|
Warning You have reached your daily quota limit. Please wait up to 24 hours and I will start processing your requests again! |
pengchengneo
approved these changes
May 14, 2026
Collaborator
|
LGTM |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Part of #1053 (P1-5a). Draft for early interface review — P1-2 (
BaseSpecWorker/BaseDraftWorkerclass refactor) builds on this contract, so opening early per discussion.Note
Depends on #1063 (P1-0). This branch is stacked on it; the actual P1-5a diff is the last commit only (4 files, +201/−26). Will rebase onto main once #1063 merges.
What
Defines the spec-decode data contract from the #1053 RFC tables so the P1-2 abstraction layer has a fixed interface to target:
spec_info.SpecInput—@runtime_checkableProtocol exposing:is_draft_input()/is_verify_input()get_logical_token_num()/get_allocated_token_num()/get_verify_token_num()— the three token counts the RFC requires to be separated (scheduler advance / KV trimming / verify-metadata+DP-accounting respectively)get_spec_adjust_token_coefficient()/filter_batch()/merge_batch()EagleDraftInput— implementsSpecInput; per-field docstrings annotate device vs host, shape, JIT-cache-key participation, and theallocate_lens≠accept_length≠new_seq_lensdistinction. Multi-layer MTP cross-round hidden_states semantics noted.EagleVerifyInput— implementsSpecInput; per-field docstrings annotate device vs host and static-metadata fields;filter_batch/merge_batchraise (verify input is single-round).test_spec_info.py— Protocol conformance + three-token-count distinction.Non-changes
GenerationBatchResultalready has all RFC-required fields except optionalnew_seq_lens(derivable fromseq_lens + accept_lens); not adding it here.np.ndarray→jax.Array | Noneto reflect actual runtime types after fix(eagle): restore dp=1 dense EAGLE3 path on main + add E2E to CI #1063's reshard changes — annotations only, no code path change.Open questions for P1-2 author
SpecInputbe aProtocol(current, duck-typed) or anABCbase class? Protocol avoids inheritance coupling butABCletsBaseSpecWorkertype-hint more strictly.get_logical_token_numcurrently returns a scalarint(sum across batch). Should it return per-requestnp.ndarrayinstead for DP per-rank accounting?EagleDraftInput.merge_batchcurrently usesnp.concatenate(host-side). RFC says these are device arrays — should merge happen on device (jnp.concatenate)?Test plan
test_spec_info.py3/3 passed (CPU)