Skip to content

refactor(spec): split EAGLEWorker into BaseSpecWorker/BaseDraftWorker…#1080

Merged
pengchengneo merged 8 commits into
sgl-project:mainfrom
primatrix:feat/p1-2-base-spec-draft-worker
May 14, 2026
Merged

refactor(spec): split EAGLEWorker into BaseSpecWorker/BaseDraftWorker…#1080
pengchengneo merged 8 commits into
sgl-project:mainfrom
primatrix:feat/p1-2-base-spec-draft-worker

Conversation

@pengchengneo
Copy link
Copy Markdown
Collaborator

… + EagleDraftWorker (P1-2)

Extract abstract base classes (BaseSpecWorker, BaseDraftWorker) and move draft logic into EagleDraftWorker so MultiLayerEAGLEWorker/MultiLayerDraftWorker (P1-4) can reuse the same verify/data-contract path.

Motivation

Modifications

Accuracy Tests

Benchmarking and Profiling

Checklist

  • Please use English, otherwise it will be closed.
  • The purpose of the PR, or link existing issues this PR will resolve.
  • The test plan, such as providing test command.
  • (Optional) The necessary documentation update.

@pengchengneo pengchengneo marked this pull request as draft May 14, 2026 07:14
@gemini-code-assist
Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request refactors the speculative decoding architecture by introducing abstract base classes for draft and speculative workers. This modular design enables better code reuse, specifically facilitating the integration of future MultiLayerEAGLE and MultiLayerDraft workers by standardizing the verification and data-contract paths.

Highlights

  • Architecture Refactoring: Introduced BaseDraftWorker and BaseSpecWorker abstract classes to standardize the speculative decoding interface.
  • Component Extraction: Implemented EagleDraftWorker to encapsulate EAGLE-specific draft model logic, such as multi-step decoding and tree building.
  • Orchestration Update: Refactored EAGLEWorker to inherit from BaseSpecWorker, delegating draft-specific tasks to the new EagleDraftWorker.
  • Code Cleanup: Removed redundant helper functions and logic from eagle_worker.py, improving maintainability and reducing code duplication.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

pengchengneo and others added 2 commits May 14, 2026 15:38
… + EagleDraftWorker (P1-2)

Extract abstract base classes (BaseSpecWorker, BaseDraftWorker) and move
draft logic into EagleDraftWorker so MultiLayerEAGLEWorker/MultiLayerDraftWorker
(P1-4) can reuse the same verify/data-contract path.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… inheritance

Change EagleDraftWorker from EagleDraftWorker(ModelWorker, BaseDraftWorker)
to EagleDraftWorker(BaseDraftWorker) with an internal self._worker ModelWorker
instance, aligning with upstream sglang V2 architecture pattern.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pengchengneo pengchengneo force-pushed the feat/p1-2-base-spec-draft-worker branch from 3dac47a to 5b70b8a Compare May 14, 2026 07:39
pengchengneo and others added 3 commits May 14, 2026 16:10
…lang V2

Strip EAGLE-specific config out of the abstract base classes:
- BaseDraftWorker: only declares draft() (drop draft_extend_for_prefill/_for_decode;
  those stay as concrete EagleDraftWorker methods called directly by EAGLEWorker)
- BaseSpecWorker: target_worker/draft_worker as abstract properties,
  clear_cache_pool() abstract, on_verify_complete_cpu() concrete hook;
  no __init__ storing topk/speculative_num_steps/etc.
- EAGLEWorker now stores its own EAGLE config and implements the property contract

This matches the thin ABC pattern used by upstream sglang's V2 architecture
(base_spec_worker.py), so future non-EAGLE spec workers (NGRAM, DFlash) won't
inherit EAGLE-specific assumptions.

Note: we deliberately omit upstream's draft_extend() pass-stub since it's never
called anywhere in the codebase; prefill/decode extend remain as concrete
EagleDraftWorker methods.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream uses this hook to feed adaptive speculative decoding controllers
without forcing a GPU->CPU sync. sglang-jax has no adaptive spec decode
feature planned in sgl-project#1053 phases 1-3, so the hook is dead weight for now.
Add it back when (and if) we adopt adaptive spec decoding.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream's EAGLE clear_cache_pool is also a no-op pass — KV pool is shared
with target_worker and cleared in the scheduler. sglang-jax's flush_cache
doesn't currently dispatch to draft_worker either, so the abstract method
serves no purpose. Reintroduce only when scheduler genuinely needs to hook
into draft worker cleanup.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@pengchengneo pengchengneo marked this pull request as ready for review May 14, 2026 08:18
@zorrofox
Copy link
Copy Markdown
Contributor

Reviewed against the RFC and our in-progress P1-4 (MultiLayerDraftWorker/MultiLayerEAGLEWorker) which is stacked on this. P1-0 fixes all survived the split (_replicate ×4, safe_index, topk_probs_from_logits reshard, device_get verified_id) — nice.

A few things from the P1-4 consumer side:

1. verify() placement (eagle_worker.py:128)
RFC puts verify() on BaseSpecWorker; here it's on EAGLEWorker. The body only touches BaseSpecWorker state (target_worker, mesh, speculative_num_*, draft_worker.draft_model_runner.rngs) — nothing EAGLE-specific. If verify() + _replicate() + forward_target_extend() move up, forward_batch_speculative_generation can become a concrete default on BaseSpecWorker, and MultiLayerEAGLEWorker can subclass BaseSpecWorker directly. Right now P1-4 has to inherit EAGLEWorker and call BaseSpecWorker.__init__ directly to skip EagleDraftWorker instantiation, which is fragile.

2. P1-1 assertion message regressed (eagle_draft_worker.py:73-77)
The original #1066 message named the root cause ("Hybrid target without the post-set_num_token_hybrid draft_runner_cache_size overwrite"). The new text says "Check --mem-fraction-static or --kv-cache-dtype" — those flags don't fix the hybrid slot-range mismatch this assert guards. Suggest restoring the original message.

3. BaseDraftWorker ABC missing precompile attrs (base_worker.py:14-58 vs eagle_worker.py:275,286,312,316,333)
run_spec_decode_precompile reaches into draft_worker.{compilation_manager, model_config, max_req_len, get_max_padded_size} — none declared on the ABC. Works because EagleDraftWorker happens to expose them via _worker. P1-4 sets self._worker = _workers[0] so it resolves, but precompile then only warms layer 0 (layers 1..N-1 JIT on first real decode). Could either declare these on BaseDraftWorker, or push dummy-batch construction down to a draft_worker.precompile_one_shape(...) so multi-layer can override it.

4. EagleDraftWorker.__init__ not subclass-friendly (eagle_draft_worker.py:42-88)
Unconditionally creates self._worker = ModelWorker(...) mid-init. P1-4's MultiLayerDraftWorker.__init__ ends up copy-pasting ~20 lines of bookkeeping (43-54, 65-67, 81-88). Extracting an _init_common() that does everything except the ModelWorker creation + _share_embed_head + initialize_jit would let subclasses reuse it.

Minor:

  • draft_model_runner (singular) on the ABC is a slight lie for multi-runner. Only external consumer is verify() for .rngs; could narrow to a sampling_rngs property, or have verify() use target_worker.model_runner.rngs instead.
  • _replicate is duplicated in both files; goes away if (1) lands.

Happy to fold (1)/(3)/(4) into P1-4 if you'd rather keep this PR's diff small — let me know which.

@pengchengneo
Copy link
Copy Markdown
Collaborator Author

I think it would be better to fold (1)/(3)/(4) into P1-4, cause this will affect your code' arch, and I will fix (2) and minor problems
@zorrofox

pengchengneo and others added 2 commits May 14, 2026 17:40
…edupe _replicate)

- Restore sgl-project#1066's original assert message naming the hybrid target /
  draft_runner_cache_size root cause; the previous "Check --mem-fraction-static"
  text pointed at flags that don't fix this slot-range mismatch.
- Add abstract `sampling_rngs` property on BaseDraftWorker so verify()
  doesn't reach into the singular `draft_model_runner.rngs` (multi-runner
  workers can override it to designate which runner provides RNGs).
- Extract `_replicate` to module-level `replicate_to_mesh(mesh, *arrs)` in
  base_worker.py; removes the duplicate definition across EAGLEWorker and
  EagleDraftWorker.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The sampling_rngs property was a one-line wrapper around
draft_model_runner.rngs without solving any real problem:
- BaseDraftWorker doesn't expose draft_model_runner on the ABC, so the
  "singular runner is a lie for multi-runner" concern never arises at
  the abstract level
- Multi-runner workers can override draft_model_runner directly to return
  whichever runner they pick; verify() keeps working unchanged
- The property doesn't address the deeper semantic question of whether
  verify (target model) should use draft RNGs at all

Revert to the direct draft_model_runner.rngs access; revisit when there's
an actual divergence in RNG strategy.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@zorrofox
Copy link
Copy Markdown
Contributor

Sounds good — will fold (1)/(3)/(4) into P1-4. We'll rebase once #1080 lands with your (2) fix.

Copy link
Copy Markdown
Contributor

@zorrofox zorrofox left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 77ea2a6f — assert message restored, sampling_rngs removal in 9aed7249 makes sense (draft_model_runner override covers the multi-runner case). (1)/(3)/(4) tracked on our side for P1-4. LGTM from the consumer side.

@JamesBrianD
Copy link
Copy Markdown
Collaborator

Naming nit: draft_model_runner → draft_runner. sglang's EagleDraftWorker exposes self.draft_runner = self.draft_worker.model_runner as the short alias. Matching that name would shorten call chains here and keep cross-repo greps consistent.

@pengchengneo
Copy link
Copy Markdown
Collaborator Author

Naming nit: draft_model_runner → draft_runner. sglang's EagleDraftWorker exposes self.draft_runner = self.draft_worker.model_runner as the short alias. Matching that name would shorten call chains here and keep cross-repo greps consistent.

OK, it will be modify in next PR's, we will change abstract relations this PR, cc @zorrofox

@pengchengneo pengchengneo merged commit 17db854 into sgl-project:main May 14, 2026
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants