[RFC] refactor: RESTful and sgl/smg compliant API#17
[RFC] refactor: RESTful and sgl/smg compliant API#17zhaochenyang20 merged 5 commits intozhaochenyang20:mainfrom
Conversation
Summary of ChangesHello @alphabetc1, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refactors the router's API to align with RESTful principles and SGLang/SMG compliance. The changes introduce a more organized and standardized set of endpoints for interacting with the diffusion workers, covering inference, worker management, and health monitoring. The accompanying documentation has been thoroughly updated to reflect these new API specifications, making the router easier to understand and integrate. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request proposes a refactoring of the API to be more RESTful, as documented in the README.md. While the new API design is a good step forward, this PR only contains documentation changes and lacks the corresponding implementation. This has led to a few issues:
- The
README.mdis now out of sync with the actual code, documenting an API that doesn't exist. - The
README.mdis internally inconsistent, with theQuick Startsection using the old API endpoints while theRouter APIsection describes the new ones. - The helpful
Project Layoutsection has been removed.
Given this is an RFC, my feedback is that the API design is good, but it should be implemented. The documentation should not be merged in this state without the code, as it's misleading. I've added specific comments on the README.md.
| ## Router API | ||
|
|
||
| - `POST /add_worker`: add worker via query (`?url=`) or JSON body. | ||
| - `GET /list_workers`: list registered workers. | ||
| - `GET /health`: aggregated router health. | ||
| - `GET /health_workers`: per-worker health and active request counts. | ||
| - `POST /generate`: forwards to worker `/v1/images/generations`. | ||
| - `POST /generate_video`: forwards to worker `/v1/videos`. | ||
| - `POST /update_weights_from_disk`: broadcast to healthy workers. | ||
| - `GET|POST|PUT|DELETE /{path}`: catch-all proxy forwarding. | ||
| ### Inference Endpoints | ||
|
|
||
| | Method | Path | Description | | ||
| |---|---|---| | ||
| | `POST` | `/v1/images/generations` | Entrypoint for text-to-image generation | | ||
| | `POST` | `/v1/videos` | Entrypoint for text-to-video generation | | ||
|
|
||
| ### Videos Result Query | ||
|
|
||
| | Method | Path | Description | | ||
| |---|---|---| | ||
| | `GET` | `/v1/videos` | List or poll video jobs | | ||
| | `GET` | `/v1/videos/{video_id}` | Get status/details of a single video job | | ||
| | `GET` | `/v1/videos/{video_id}/content` | Download generated video content | | ||
|
|
||
| ### Model Discovery and Health Checks | ||
|
|
||
| | Method | Path | Description | | ||
| |---|---|---| | ||
| | `GET` | `/v1/models` | OpenAI-style model discovery | | ||
| | `GET` | `/health` | Basic health probe | | ||
|
|
||
| ### Worker Management APIs | ||
|
|
||
| | Method | Path | Description | | ||
| |---|---|---| | ||
| | `POST` | `/workers` | Register a worker | | ||
| | `GET` | `/workers` | List workers (including health/load) | | ||
| | `GET` | `/workers/{worker_id}` | Get worker details | | ||
| | `PUT` | `/workers/{worker_id}` | Update worker configuration | | ||
| | `DELETE` | `/workers/{worker_id}` | Deregister a worker | | ||
|
|
||
| ### Optional (business-dependent) | ||
|
|
||
| | Method | Path | Description | | ||
| |---|---|---| | ||
| | `POST` | `/update_weights_from_disk` | Reload weights from disk (ops/admin use) | | ||
|
|
There was a problem hiding this comment.
This new API documentation is a great improvement in terms of being RESTful and well-structured. However, the code in this repository has not been updated to implement these new endpoints. The router still uses the old endpoints (e.g., /add_worker, /generate). This makes the documentation inaccurate. The implementation should be updated to match this documentation, or this documentation should be marked as a proposal for a future version.
| - [Quick Start](#quick-start) | ||
| - [Start diffusion workers](#start-diffusion-workers) | ||
| - [Start the router](#start-the-router) | ||
| - [Test the router](#test-the-router) |
| @@ -165,6 +224,8 @@ | |||
| } | |||
| ``` | |||
|
|
|||
| --- | |||
|
|
|||
| ## Benchmark Scripts | |||
|
|
|||
| Benchmark scripts are available under `tests/benchmarks/diffusion_router/` and are intended for manual runs. | |||
| @@ -190,29 +251,14 @@ | |||
| --max-concurrency 4 | |||
| ``` | |||
|
|
|||
| ## Project Layout | |||
|
|
|||
| ```text | |||
| . | |||
| ├── docs/ | |||
| │ └── update_weights_from_disk.md | |||
| ├── src/sglang_diffusion_routing/ | |||
| │ ├── cli/ | |||
| │ └── router/ | |||
| ├── tests/ | |||
| │ ├── benchmarks/ | |||
| │ │ └── diffusion_router/ | |||
| │ │ ├── bench_router.py | |||
| │ │ └── bench_routing_algorithms.py | |||
| │ └── unit/ | |||
| ├── pyproject.toml | |||
| └── README.md | |||
| ``` | |||
1a4af52 to
71e8d80
Compare
| - `is_dead` (boolean): quarantine (`true`) or recover (`false`) this worker. | ||
| - `refresh_video_support` (boolean): re-probe worker `/v1/models` capability. | ||
|
|
||
| ### Optional (business-dependent) |
d92d735 to
4bf82aa
Compare
4bf82aa to
91f22bd
Compare
|

to #13