refactor(spec): split EAGLEWorker into BaseSpecWorker/BaseDraftWorker…#1080
Conversation
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the speculative decoding architecture by introducing abstract base classes for draft and speculative workers. This modular design enables better code reuse, specifically facilitating the integration of future MultiLayerEAGLE and MultiLayerDraft workers by standardizing the verification and data-contract paths. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
… + EagleDraftWorker (P1-2) Extract abstract base classes (BaseSpecWorker, BaseDraftWorker) and move draft logic into EagleDraftWorker so MultiLayerEAGLEWorker/MultiLayerDraftWorker (P1-4) can reuse the same verify/data-contract path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
… inheritance Change EagleDraftWorker from EagleDraftWorker(ModelWorker, BaseDraftWorker) to EagleDraftWorker(BaseDraftWorker) with an internal self._worker ModelWorker instance, aligning with upstream sglang V2 architecture pattern. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
3dac47a to
5b70b8a
Compare
…lang V2 Strip EAGLE-specific config out of the abstract base classes: - BaseDraftWorker: only declares draft() (drop draft_extend_for_prefill/_for_decode; those stay as concrete EagleDraftWorker methods called directly by EAGLEWorker) - BaseSpecWorker: target_worker/draft_worker as abstract properties, clear_cache_pool() abstract, on_verify_complete_cpu() concrete hook; no __init__ storing topk/speculative_num_steps/etc. - EAGLEWorker now stores its own EAGLE config and implements the property contract This matches the thin ABC pattern used by upstream sglang's V2 architecture (base_spec_worker.py), so future non-EAGLE spec workers (NGRAM, DFlash) won't inherit EAGLE-specific assumptions. Note: we deliberately omit upstream's draft_extend() pass-stub since it's never called anywhere in the codebase; prefill/decode extend remain as concrete EagleDraftWorker methods. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream uses this hook to feed adaptive speculative decoding controllers without forcing a GPU->CPU sync. sglang-jax has no adaptive spec decode feature planned in sgl-project#1053 phases 1-3, so the hook is dead weight for now. Add it back when (and if) we adopt adaptive spec decoding. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Upstream's EAGLE clear_cache_pool is also a no-op pass — KV pool is shared with target_worker and cleared in the scheduler. sglang-jax's flush_cache doesn't currently dispatch to draft_worker either, so the abstract method serves no purpose. Reintroduce only when scheduler genuinely needs to hook into draft worker cleanup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Reviewed against the RFC and our in-progress P1-4 ( A few things from the P1-4 consumer side: 1. 2. P1-1 assertion message regressed ( 3. 4. Minor:
Happy to fold (1)/(3)/(4) into P1-4 if you'd rather keep this PR's diff small — let me know which. |
|
I think it would be better to fold (1)/(3)/(4) into P1-4, cause this will affect your code' arch, and I will fix (2) and minor problems |
…edupe _replicate) - Restore sgl-project#1066's original assert message naming the hybrid target / draft_runner_cache_size root cause; the previous "Check --mem-fraction-static" text pointed at flags that don't fix this slot-range mismatch. - Add abstract `sampling_rngs` property on BaseDraftWorker so verify() doesn't reach into the singular `draft_model_runner.rngs` (multi-runner workers can override it to designate which runner provides RNGs). - Extract `_replicate` to module-level `replicate_to_mesh(mesh, *arrs)` in base_worker.py; removes the duplicate definition across EAGLEWorker and EagleDraftWorker. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The sampling_rngs property was a one-line wrapper around draft_model_runner.rngs without solving any real problem: - BaseDraftWorker doesn't expose draft_model_runner on the ABC, so the "singular runner is a lie for multi-runner" concern never arises at the abstract level - Multi-runner workers can override draft_model_runner directly to return whichever runner they pick; verify() keeps working unchanged - The property doesn't address the deeper semantic question of whether verify (target model) should use draft RNGs at all Revert to the direct draft_model_runner.rngs access; revisit when there's an actual divergence in RNG strategy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
Sounds good — will fold (1)/(3)/(4) into P1-4. We'll rebase once #1080 lands with your (2) fix. |
zorrofox
left a comment
There was a problem hiding this comment.
Reviewed 77ea2a6f — assert message restored, sampling_rngs removal in 9aed7249 makes sense (draft_model_runner override covers the multi-runner case). (1)/(3)/(4) tracked on our side for P1-4. LGTM from the consumer side.
|
Naming nit: draft_model_runner → draft_runner. sglang's EagleDraftWorker exposes self.draft_runner = self.draft_worker.model_runner as the short alias. Matching that name would shorten call chains here and keep cross-repo greps consistent. |
OK, it will be modify in next PR's, we will change abstract relations this PR, cc @zorrofox |
… + EagleDraftWorker (P1-2)
Extract abstract base classes (BaseSpecWorker, BaseDraftWorker) and move draft logic into EagleDraftWorker so MultiLayerEAGLEWorker/MultiLayerDraftWorker (P1-4) can reuse the same verify/data-contract path.
Motivation
Modifications
Accuracy Tests
Benchmarking and Profiling
Checklist