Skip to content

Commit 4bf82aa

Browse files
committed
refactor: RESTful and sgl/smg compliant API
1 parent ccf36da commit 4bf82aa

File tree

4 files changed

+778
-84
lines changed

4 files changed

+778
-84
lines changed

README.md

Lines changed: 119 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -4,16 +4,28 @@ A lightweight router for SGLang diffusion workers used in RL systems.
44

55
It provides worker registration, load balancing, health checking, refit weights and request proxying for diffusion generation APIs.
66

7-
## API Reference
7+
---
88

9-
- `POST /add_worker`: add worker via query (`?url=`) or JSON body.
10-
- `GET /list_workers`: list registered workers.
11-
- `GET /health`: aggregated router health.
12-
- `GET /health_workers`: per-worker health and active request counts.
13-
- `POST /generate`: forwards to worker `/v1/images/generations`.
14-
- `POST /generate_video`: forwards to worker `/v1/videos`; rejects image-only workers (`T2I`/`I2I`/`TI2I`) with `400`.
15-
- `POST /update_weights_from_disk`: broadcast to all healthy workers.
16-
- `GET|POST|PUT|DELETE /{path}`: catch-all proxy forwarding.
9+
## Table of Contents
10+
11+
- [Overview](#Overview)
12+
- [Installation](#installation)
13+
- [Quick Start](#quick-start)
14+
- [Start diffusion workers](#start-diffusion-workers)
15+
- [Start the router](#start-the-router)
16+
- [Router API](#router-api)
17+
- [Inference Endpoints](#inference-endpoints)
18+
- [Videos Result Query](#videos-result-query)
19+
- [Model Discovery and Health Checks](#model-discovery-and-health-checks)
20+
- [Worker Management APIs](#worker-management-apis)
21+
- [Optional (business-dependent)](#optional-business-dependent)
22+
- [Acknowledgment](#acknowledgment)
23+
24+
---
25+
26+
## Overview
27+
28+
---
1729

1830
## Installation
1931

@@ -39,6 +51,8 @@ uv pip install "sglang[diffusion]" --prerelease=allow
3951
cd ..
4052
```
4153

54+
---
55+
4256
## Quick Start
4357

4458
### Co-launch workers and router via YAML config
@@ -97,12 +111,26 @@ ROUTER = "http://localhost:30081"
97111
resp = requests.get(f"{ROUTER}/health")
98112
print(resp.json())
99113

100-
# List registered workers
101-
resp = requests.get(f"{ROUTER}/list_workers")
114+
# Register a worker
115+
resp = requests.post(f"{ROUTER}/workers", json={"url": "http://localhost:30000"})
116+
print(resp.json())
117+
118+
# List registered workers (with health/load)
119+
resp = requests.get(f"{ROUTER}/workers")
120+
print(resp.json())
121+
worker_id = resp.json()["workers"][0]["worker_id"]
122+
123+
# Get / update worker details
124+
resp = requests.get(f"{ROUTER}/workers/{worker_id}")
125+
print(resp.json())
126+
resp = requests.put(
127+
f"{ROUTER}/workers/{worker_id}",
128+
json={"is_dead": False, "refresh_video_support": True},
129+
)
102130
print(resp.json())
103131

104132
# Image generation request (returns base64-encoded image)
105-
resp = requests.post(f"{ROUTER}/generate", json={
133+
resp = requests.post(f"{ROUTER}/v1/images/generations", json={
106134
"model": "Qwen/Qwen-Image",
107135
"prompt": "a cute cat",
108136
"num_images": 1,
@@ -117,10 +145,15 @@ with open("output.png", "wb") as f:
117145
f.write(img)
118146
print("Saved to output.png")
119147

120-
121-
# Check per-worker health and load
122-
resp = requests.get(f"{ROUTER}/health_workers")
148+
# Video generation request
149+
resp = requests.post(f"{ROUTER}/v1/videos", json={
150+
"model": "Qwen/Qwen-Image",
151+
"prompt": "a flowing river",
152+
})
123153
print(resp.json())
154+
video_id = resp.json().get("video_id") or resp.json().get("id")
155+
if video_id:
156+
print(requests.get(f"{ROUTER}/v1/videos/{video_id}").json())
124157

125158
# Update weights from disk
126159
resp = requests.post(f"{ROUTER}/update_weights_from_disk", json={
@@ -135,11 +168,16 @@ print(resp.json())
135168
# Check router health
136169
curl http://localhost:30081/health
137170

138-
# List registered workers
139-
curl http://localhost:30081/list_workers
171+
# Register a worker
172+
curl -X POST http://localhost:30081/workers \
173+
-H "Content-Type: application/json" \
174+
-d '{"url": "http://localhost:30000"}'
175+
176+
# List registered workers (with health/load)
177+
curl http://localhost:30081/workers
140178

141179
# Image generation request (returns base64-encoded image)
142-
curl -X POST http://localhost:30081/generate \
180+
curl -X POST http://localhost:30081/v1/images/generations \
143181
-H "Content-Type: application/json" \
144182
-d '{
145183
"model": "Qwen/Qwen-Image",
@@ -149,7 +187,7 @@ curl -X POST http://localhost:30081/generate \
149187
}'
150188

151189
# Decode and save the image locally
152-
curl -s -X POST http://localhost:30081/generate \
190+
curl -s -X POST http://localhost:30081/v1/images/generations \
153191
-H "Content-Type: application/json" \
154192
-d '{
155193
"model": "Qwen/Qwen-Image",
@@ -165,12 +203,74 @@ with open('output.png', 'wb') as f:
165203
print('Saved to output.png')
166204
"
167205

206+
# Video generation request
207+
curl -X POST http://localhost:30081/v1/videos \
208+
-H "Content-Type: application/json" \
209+
-d '{"model": "Qwen/Qwen-Image", "prompt": "a flowing river"}'
210+
211+
# Poll a specific video job by video_id
212+
curl http://localhost:30081/v1/videos/<video_id>
213+
168214

169215
curl -X POST http://localhost:30081/update_weights_from_disk \
170216
-H "Content-Type: application/json" \
171217
-d '{"model_path": "Qwen/Qwen-Image-2512"}'
172218
```
173219

220+
---
221+
222+
## Router API
223+
224+
### Inference Endpoints
225+
226+
| Method | Path | Description |
227+
|---|---|---|
228+
| `POST` | `/v1/images/generations` | Entrypoint for text-to-image generation |
229+
| `POST` | `/v1/videos` | Entrypoint for text-to-video generation |
230+
231+
### Videos Result Query
232+
233+
| Method | Path | Description |
234+
|---|---|---|
235+
| `GET` | `/v1/videos` | List or poll video jobs |
236+
| `GET` | `/v1/videos/{video_id}` | Get status/details of a single video job |
237+
| `GET` | `/v1/videos/{video_id}/content` | Download generated video content |
238+
239+
Video query routing is stable by `video_id`: router caches `video_id -> worker` on create (`POST /v1/videos`), then forwards detail/content queries to the same worker. Unknown `video_id` returns `404`.
240+
241+
### Model Discovery and Health Checks
242+
243+
| Method | Path | Description |
244+
|---|---|---|
245+
| `GET` | `/v1/models` | OpenAI-style model discovery |
246+
| `GET` | `/health` | Basic health probe |
247+
248+
`GET /v1/models` aggregates model lists from healthy workers and de-duplicates by model `id`.
249+
250+
### Worker Management APIs
251+
252+
| Method | Path | Description |
253+
|---|---|---|
254+
| `POST` | `/workers` | Register a worker |
255+
| `GET` | `/workers` | List workers (including health/load) |
256+
| `GET` | `/workers/{worker_id}` | Get worker details |
257+
| `PUT` | `/workers/{worker_id}` | Update worker configuration |
258+
| `DELETE` | `/workers/{worker_id}` | Deregister a worker |
259+
260+
`worker_id` is the URL-encoded worker URL.
261+
262+
`PUT /workers/{worker_id}` currently supports:
263+
- `is_dead` (boolean): quarantine (`true`) or recover (`false`) this worker.
264+
- `refresh_video_support` (boolean): re-probe worker `/v1/models` capability.
265+
266+
### Optional (business-dependent)
267+
268+
| Method | Path | Description |
269+
|---|---|---|
270+
| `POST` | `/update_weights_from_disk` | Reload weights from disk (ops/admin use) |
271+
272+
---
273+
174274
## Acknowledgment
175275

176276
This project is derived from [radixark/miles#544](https://github.com/radixark/miles/pull/544). Thanks to the original authors.

src/sglang_diffusion_routing/cli/main.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -40,7 +40,7 @@ async def _refresh_all_worker_video_support() -> None:
4040

4141
print(f"{log_prefix} starting router on {args.host}:{args.port}", flush=True)
4242
print(
43-
f"{log_prefix} workers: {list(router.worker_request_counts.keys()) or '(none - add via POST /add_worker)'}",
43+
f"{log_prefix} workers: {list(router.worker_request_counts.keys()) or '(none - add via POST /workers)'}",
4444
flush=True,
4545
)
4646
uvicorn.run(

0 commit comments

Comments
 (0)