[RFC] refactor: RESTful and sgl/smg compliant API by alphabetc1 · Pull Request #17 · zhaochenyang20/sglang-diffusion-routing

alphabetc1 · 2026-02-22T18:44:42Z

gemini-code-assist · 2026-02-22T18:44:53Z

Summary of Changes

Hello @alphabetc1, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refactors the router's API to align with RESTful principles and SGLang/SMG compliance. The changes introduce a more organized and standardized set of endpoints for interacting with the diffusion workers, covering inference, worker management, and health monitoring. The accompanying documentation has been thoroughly updated to reflect these new API specifications, making the router easier to understand and integrate.

Highlights

API Refactoring: The router's API has been refactored to be more RESTful and compliant with SGLang/SMG standards, introducing structured endpoints for inference, video result queries, model discovery, health checks, and worker management.
Documentation Enhancement: The README.md file has been significantly updated to include a comprehensive Table of Contents and detailed documentation for the new API endpoints, improving clarity and usability.
Router Features: Existing router features such as least-request routing, background health checks with quarantine, and update_weights_from_disk broadcasting remain core functionalities, now better documented within the new API structure.

Changelog

README.md
- Added a comprehensive Table of Contents for improved navigation.
- Introduced detailed sections for Router API, categorizing endpoints into Inference, Video Result Query, Model Discovery and Health Checks, Worker Management, and Optional APIs.
- Replaced the previous bulleted list of API endpoints with structured tables, providing method, path, and description for each endpoint.
- Removed the 'Project Layout' section.

Activity

No human activity has been recorded on this pull request yet.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request proposes a refactoring of the API to be more RESTful, as documented in the README.md. While the new API design is a good step forward, this PR only contains documentation changes and lacks the corresponding implementation. This has led to a few issues:

The README.md is now out of sync with the actual code, documenting an API that doesn't exist.
The README.md is internally inconsistent, with the Quick Start section using the old API endpoints while the Router API section describes the new ones.
The helpful Project Layout section has been removed.

Given this is an RFC, my feedback is that the API design is good, but it should be implemented. The documentation should not be merged in this state without the code, as it's misleading. I've added specific comments on the README.md.

gemini-code-assist · 2026-02-22T18:46:12Z

 ## Router API

- `POST /add_worker`: add worker via query (`?url=`) or JSON body.
- `GET /list_workers`: list registered workers.
- `GET /health`: aggregated router health.
- `GET /health_workers`: per-worker health and active request counts.
- `POST /generate`: forwards to worker `/v1/images/generations`.
- `POST /generate_video`: forwards to worker `/v1/videos`.
- `POST /update_weights_from_disk`: broadcast to healthy workers.
- `GET|POST|PUT|DELETE /{path}`: catch-all proxy forwarding.
+### Inference Endpoints
+
+| Method | Path | Description |
+|---|---|---|
+| `POST` | `/v1/images/generations` | Entrypoint for text-to-image generation |
+| `POST` | `/v1/videos` | Entrypoint for text-to-video generation |
+
+### Videos Result Query
+
+| Method | Path | Description |
+|---|---|---|
+| `GET` | `/v1/videos` | List or poll video jobs |
+| `GET` | `/v1/videos/{video_id}` | Get status/details of a single video job |
+| `GET` | `/v1/videos/{video_id}/content` | Download generated video content |
+
+### Model Discovery and Health Checks
+
+| Method | Path | Description |
+|---|---|---|
+| `GET` | `/v1/models` | OpenAI-style model discovery |
+| `GET` | `/health` | Basic health probe |
+
+### Worker Management APIs
+
+| Method | Path | Description |
+|---|---|---|
+| `POST` | `/workers` | Register a worker |
+| `GET` | `/workers` | List workers (including health/load) |
+| `GET` | `/workers/{worker_id}` | Get worker details |
+| `PUT` | `/workers/{worker_id}` | Update worker configuration |
+| `DELETE` | `/workers/{worker_id}` | Deregister a worker |
+
+### Optional (business-dependent)
+
+| Method | Path | Description |
+|---|---|---|
+| `POST` | `/update_weights_from_disk` | Reload weights from disk (ops/admin use) |
+


This new API documentation is a great improvement in terms of being RESTful and well-structured. However, the code in this repository has not been updated to implement these new endpoints. The router still uses the old endpoints (e.g., /add_worker, /generate). This makes the documentation inaccurate. The implementation should be updated to match this documentation, or this documentation should be marked as a proposal for a future version.

gemini-code-assist · 2026-02-22T18:46:12Z

+- [Quick Start](#quick-start)
+  - [Start diffusion workers](#start-diffusion-workers)
+  - [Start the router](#start-the-router)
+  - [Test the router](#test-the-router)


The Quick Start section, which is linked here, still uses the old API endpoints (e.g., /add_worker, /generate). This is inconsistent with the new Router API section. The examples in Quick Start should be updated to use the new API endpoints for consistency.

gemini-code-assist · 2026-02-22T18:46:12Z

@@ -165,6 +224,8 @@
 }
 ```

+---
+
 ## Benchmark Scripts

 Benchmark scripts are available under `tests/benchmarks/diffusion_router/` and are intended for manual runs.
@@ -190,29 +251,14 @@
    --max-concurrency 4
 ```

-## Project Layout
-
-```text
-.
-├── docs/
-│   └── update_weights_from_disk.md
-├── src/sglang_diffusion_routing/
-│   ├── cli/
-│   └── router/
-├── tests/
-│   ├── benchmarks/
-│   │   └── diffusion_router/
-│   │       ├── bench_router.py
-│   │       └── bench_routing_algorithms.py
-│   └── unit/
-├── pyproject.toml
-└── README.md
-```


This Project Layout section provided a helpful overview of the repository structure. Its removal might make it harder for new contributors to navigate the codebase. Please consider restoring it or moving this information to a contributing guide.

zhaochenyang20 · 2026-02-24T05:14:57Z

+- `is_dead` (boolean): quarantine (`true`) or recover (`false`) this worker.
+- `refresh_video_support` (boolean): re-probe worker `/v1/models` capability.
+
+### Optional (business-dependent)


This is RL related API

…sionRouter

zhaochenyang20 · 2026-02-27T00:55:05Z

I fixed the bug of [sglang-d-router] error: 'FastAPI' object has no attribute 'add_event_handler'
I refactored the code. Ensure that the router starts to serve after all the servers are started.

zhaochenyang20 · 2026-02-27T01:13:26Z

Made a banner

alphabetc1 requested a review from zhaochenyang20 February 22, 2026 18:44

gemini-code-assist Bot reviewed Feb 22, 2026

View reviewed changes

alphabetc1 force-pushed the refactor/api branch from 1a4af52 to 71e8d80 Compare February 23, 2026 18:34

zhaochenyang20 requested changes Feb 24, 2026

View reviewed changes

alphabetc1 force-pushed the refactor/api branch from d92d735 to 4bf82aa Compare February 24, 2026 16:45

refactor: RESTful and sgl/smg compliant API

91f22bd

alphabetc1 force-pushed the refactor/api branch from 4bf82aa to 91f22bd Compare February 24, 2026 16:50

fix ci

6fb500e

zhaochenyang20 approved these changes Feb 27, 2026

View reviewed changes

refactor: streamline backend launch and health check process in Diffu…

7583c41

…sionRouter

adds banner

72bb07e

change banner

9a78c3e

zhaochenyang20 merged commit 74604ca into zhaochenyang20:main Feb 27, 2026
2 checks passed

alphabetc1 deleted the refactor/api branch February 27, 2026 09:53

BBuf mentioned this pull request Apr 20, 2026

SGLang Diffusion 外部影响力调研：kernel、feature 与平台采用情况 BBuf/how-to-optim-algorithm-in-cuda#14

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] refactor: RESTful and sgl/smg compliant API#17

[RFC] refactor: RESTful and sgl/smg compliant API#17
zhaochenyang20 merged 5 commits intozhaochenyang20:mainfrom
alphabetc1:refactor/api

alphabetc1 commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

gemini-code-assist Bot Feb 22, 2026

Uh oh!

gemini-code-assist Bot Feb 22, 2026

Uh oh!

gemini-code-assist Bot Feb 22, 2026

Uh oh!

zhaochenyang20 Feb 24, 2026

Uh oh!

zhaochenyang20 commented Feb 27, 2026

Uh oh!

zhaochenyang20 commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alphabetc1 commented Feb 22, 2026

Uh oh!

gemini-code-assist Bot commented Feb 22, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist Bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist Bot Feb 22, 2026

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 Feb 24, 2026

Choose a reason for hiding this comment

Uh oh!

zhaochenyang20 commented Feb 27, 2026

Uh oh!

zhaochenyang20 commented Feb 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants