refactor(spec): split EAGLEWorker into BaseSpecWorker/BaseDraftWorker… by pengchengneo · Pull Request #1080 · sgl-project/sglang-jax

pengchengneo · 2026-05-14T07:14:43Z

… + EagleDraftWorker (P1-2)

Extract abstract base classes (BaseSpecWorker, BaseDraftWorker) and move draft logic into EagleDraftWorker so MultiLayerEAGLEWorker/MultiLayerDraftWorker (P1-4) can reuse the same verify/data-contract path.

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Please use English, otherwise it will be closed.
The purpose of the PR, or link existing issues this PR will resolve.
The test plan, such as providing test command.
(Optional) The necessary documentation update.

gemini-code-assist · 2026-05-14T07:14:54Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the speculative decoding architecture by introducing abstract base classes for draft and speculative workers. This modular design enables better code reuse, specifically facilitating the integration of future MultiLayerEAGLE and MultiLayerDraft workers by standardizing the verification and data-contract paths.

Highlights

Architecture Refactoring: Introduced BaseDraftWorker and BaseSpecWorker abstract classes to standardize the speculative decoding interface.
Component Extraction: Implemented EagleDraftWorker to encapsulate EAGLE-specific draft model logic, such as multi-step decoding and tree building.
Orchestration Update: Refactored EAGLEWorker to inherit from BaseSpecWorker, delegating draft-specific tasks to the new EagleDraftWorker.
Code Cleanup: Removed redundant helper functions and logic from eagle_worker.py, improving maintainability and reducing code duplication.

New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

… + EagleDraftWorker (P1-2) Extract abstract base classes (BaseSpecWorker, BaseDraftWorker) and move draft logic into EagleDraftWorker so MultiLayerEAGLEWorker/MultiLayerDraftWorker (P1-4) can reuse the same verify/data-contract path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

… inheritance Change EagleDraftWorker from EagleDraftWorker(ModelWorker, BaseDraftWorker) to EagleDraftWorker(BaseDraftWorker) with an internal self._worker ModelWorker instance, aligning with upstream sglang V2 architecture pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…lang V2 Strip EAGLE-specific config out of the abstract base classes: - BaseDraftWorker: only declares draft() (drop draft_extend_for_prefill/_for_decode; those stay as concrete EagleDraftWorker methods called directly by EAGLEWorker) - BaseSpecWorker: target_worker/draft_worker as abstract properties, clear_cache_pool() abstract, on_verify_complete_cpu() concrete hook; no __init__ storing topk/speculative_num_steps/etc. - EAGLEWorker now stores its own EAGLE config and implements the property contract This matches the thin ABC pattern used by upstream sglang's V2 architecture (base_spec_worker.py), so future non-EAGLE spec workers (NGRAM, DFlash) won't inherit EAGLE-specific assumptions. Note: we deliberately omit upstream's draft_extend() pass-stub since it's never called anywhere in the codebase; prefill/decode extend remain as concrete EagleDraftWorker methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Upstream uses this hook to feed adaptive speculative decoding controllers without forcing a GPU->CPU sync. sglang-jax has no adaptive spec decode feature planned in sgl-project#1053 phases 1-3, so the hook is dead weight for now. Add it back when (and if) we adopt adaptive spec decoding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Upstream's EAGLE clear_cache_pool is also a no-op pass — KV pool is shared with target_worker and cleared in the scheduler. sglang-jax's flush_cache doesn't currently dispatch to draft_worker either, so the abstract method serves no purpose. Reintroduce only when scheduler genuinely needs to hook into draft worker cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zorrofox · 2026-05-14T09:23:30Z

Reviewed against the RFC and our in-progress P1-4 (MultiLayerDraftWorker/MultiLayerEAGLEWorker) which is stacked on this. P1-0 fixes all survived the split (_replicate ×4, safe_index, topk_probs_from_logits reshard, device_get verified_id) — nice.

A few things from the P1-4 consumer side:

1. verify() placement (eagle_worker.py:128)
RFC puts verify() on BaseSpecWorker; here it's on EAGLEWorker. The body only touches BaseSpecWorker state (target_worker, mesh, speculative_num_*, draft_worker.draft_model_runner.rngs) — nothing EAGLE-specific. If verify() + _replicate() + forward_target_extend() move up, forward_batch_speculative_generation can become a concrete default on BaseSpecWorker, and MultiLayerEAGLEWorker can subclass BaseSpecWorker directly. Right now P1-4 has to inherit EAGLEWorker and call BaseSpecWorker.__init__ directly to skip EagleDraftWorker instantiation, which is fragile.

2. P1-1 assertion message regressed (eagle_draft_worker.py:73-77)
The original #1066 message named the root cause ("Hybrid target without the post-set_num_token_hybrid draft_runner_cache_size overwrite"). The new text says "Check --mem-fraction-static or --kv-cache-dtype" — those flags don't fix the hybrid slot-range mismatch this assert guards. Suggest restoring the original message.

3. BaseDraftWorker ABC missing precompile attrs (base_worker.py:14-58 vs eagle_worker.py:275,286,312,316,333)
run_spec_decode_precompile reaches into draft_worker.{compilation_manager, model_config, max_req_len, get_max_padded_size} — none declared on the ABC. Works because EagleDraftWorker happens to expose them via _worker. P1-4 sets self._worker = _workers[0] so it resolves, but precompile then only warms layer 0 (layers 1..N-1 JIT on first real decode). Could either declare these on BaseDraftWorker, or push dummy-batch construction down to a draft_worker.precompile_one_shape(...) so multi-layer can override it.

4. EagleDraftWorker.__init__ not subclass-friendly (eagle_draft_worker.py:42-88)
Unconditionally creates self._worker = ModelWorker(...) mid-init. P1-4's MultiLayerDraftWorker.__init__ ends up copy-pasting ~20 lines of bookkeeping (43-54, 65-67, 81-88). Extracting an _init_common() that does everything except the ModelWorker creation + _share_embed_head + initialize_jit would let subclasses reuse it.

Minor:

draft_model_runner (singular) on the ABC is a slight lie for multi-runner. Only external consumer is verify() for .rngs; could narrow to a sampling_rngs property, or have verify() use target_worker.model_runner.rngs instead.
_replicate is duplicated in both files; goes away if (1) lands.

Happy to fold (1)/(3)/(4) into P1-4 if you'd rather keep this PR's diff small — let me know which.

pengchengneo · 2026-05-14T09:39:57Z

I think it would be better to fold (1)/(3)/(4) into P1-4, cause this will affect your code' arch, and I will fix (2) and minor problems
@zorrofox

…edupe _replicate) - Restore sgl-project#1066's original assert message naming the hybrid target / draft_runner_cache_size root cause; the previous "Check --mem-fraction-static" text pointed at flags that don't fix this slot-range mismatch. - Add abstract `sampling_rngs` property on BaseDraftWorker so verify() doesn't reach into the singular `draft_model_runner.rngs` (multi-runner workers can override it to designate which runner provides RNGs). - Extract `_replicate` to module-level `replicate_to_mesh(mesh, *arrs)` in base_worker.py; removes the duplicate definition across EAGLEWorker and EagleDraftWorker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The sampling_rngs property was a one-line wrapper around draft_model_runner.rngs without solving any real problem: - BaseDraftWorker doesn't expose draft_model_runner on the ABC, so the "singular runner is a lie for multi-runner" concern never arises at the abstract level - Multi-runner workers can override draft_model_runner directly to return whichever runner they pick; verify() keeps working unchanged - The property doesn't address the deeper semantic question of whether verify (target model) should use draft RNGs at all Revert to the direct draft_model_runner.rngs access; revisit when there's an actual divergence in RNG strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

zorrofox · 2026-05-14T10:02:05Z

Sounds good — will fold (1)/(3)/(4) into P1-4. We'll rebase once #1080 lands with your (2) fix.

zorrofox

Reviewed 77ea2a6f — assert message restored, sampling_rngs removal in 9aed7249 makes sense (draft_model_runner override covers the multi-runner case). (1)/(3)/(4) tracked on our side for P1-4. LGTM from the consumer side.

JamesBrianD · 2026-05-14T11:08:39Z

Naming nit: draft_model_runner → draft_runner. sglang's EagleDraftWorker exposes self.draft_runner = self.draft_worker.model_runner as the short alias. Matching that name would shorten call chains here and keep cross-repo greps consistent.

pengchengneo · 2026-05-14T11:58:32Z

Naming nit: draft_model_runner → draft_runner. sglang's EagleDraftWorker exposes self.draft_runner = self.draft_worker.model_runner as the short alias. Matching that name would shorten call chains here and keep cross-repo greps consistent.

OK, it will be modify in next PR's, we will change abstract relations this PR, cc @zorrofox

pengchengneo marked this pull request as draft May 14, 2026 07:14

pengchengneo and others added 2 commits May 14, 2026 15:38

pengchengneo force-pushed the feat/p1-2-base-spec-draft-worker branch from 3dac47a to 5b70b8a Compare May 14, 2026 07:39

pengchengneo and others added 3 commits May 14, 2026 16:10

pengchengneo marked this pull request as ready for review May 14, 2026 08:18

ci: trigger re-run after PR moved out of draft

10ab07a

pengchengneo mentioned this pull request May 14, 2026

[Feature] Refactor Eagle/MTP RFC #1053

Open

11 tasks

pengchengneo and others added 2 commits May 14, 2026 17:40

zorrofox approved these changes May 14, 2026

View reviewed changes

jimoosciuc approved these changes May 14, 2026

View reviewed changes

pengchengneo merged commit 17db854 into sgl-project:main May 14, 2026
18 checks passed

zorrofox mentioned this pull request May 14, 2026

feat(spec): MultiLayerEAGLEWorker/MultiLayerDraftWorker (#1053 P1-4) #1089

Open

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(spec): split EAGLEWorker into BaseSpecWorker/BaseDraftWorker…#1080

refactor(spec): split EAGLEWorker into BaseSpecWorker/BaseDraftWorker…#1080
pengchengneo merged 8 commits into
sgl-project:mainfrom
primatrix:feat/p1-2-base-spec-draft-worker

pengchengneo commented May 14, 2026

Uh oh!

gemini-code-assist Bot commented May 14, 2026

Uh oh!

zorrofox commented May 14, 2026

Uh oh!

pengchengneo commented May 14, 2026

Uh oh!

zorrofox commented May 14, 2026

Uh oh!

zorrofox left a comment

Uh oh!

JamesBrianD commented May 14, 2026

Uh oh!

pengchengneo commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

pengchengneo commented May 14, 2026

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

Uh oh!

gemini-code-assist Bot commented May 14, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

zorrofox commented May 14, 2026

Uh oh!

pengchengneo commented May 14, 2026

Uh oh!

zorrofox commented May 14, 2026

Uh oh!

zorrofox left a comment

Choose a reason for hiding this comment

Uh oh!

JamesBrianD commented May 14, 2026

Uh oh!

pengchengneo commented May 14, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants