|
1 | 1 | # sglang-diffusion-routing |
2 | 2 |
|
3 | | -A demonstrative example of running SGLang Diffusion with a DP router, which supports `generation` (a lot of methods, including [SDE/CPS](https://github.com/sgl-project/sglang/pull/18806)), `update_weights_from_disk` in PR [18306](https://github.com/sgl-project/sglang/pull/18306), and `health_check`. |
| 3 | +A lightweight router for SGLang diffusion workers. |
4 | 4 |
|
5 | | -1. Copy all the codes of https://github.com/radixark/miles/pull/544 to here with sincere acknowledgment. |
6 | | -2. Write up a detailed README on how to use SGLang Diffusion Router to launch multiple instances and send requests. |
| 5 | +It provides worker registration, load balancing, health checking, and request proxying for diffusion generation APIs. |
7 | 6 |
|
8 | | -For example, given that we can make a Python binding of the sglang-d router: |
| 7 | +## Highlights |
9 | 8 |
|
10 | | -1. pip install sglang-d-router (Only for local development right now, clone the repository and run `pip install .` from the root directory. No need to make a PyPi) |
11 | | -2. pip install "sglang[diffusion]" |
12 | | -3. launching command (how to use sglang-d-router to launch n sglang diffusion servers) |
13 | | -4. Sending demonstrative requests |
| 9 | +- `least-request` routing by default, with `round-robin` and `random`. |
| 10 | +- Background health checks with quarantine after repeated failures. |
| 11 | +- Router APIs for worker registration, health inspection, and proxy forwarding. |
| 12 | +- `update_weights_from_disk` broadcast to all healthy workers. |
| 13 | + |
| 14 | +## Installation |
| 15 | + |
| 16 | +From repository root: |
| 17 | + |
| 18 | +```bash |
| 19 | +python3 -m venv .venv |
| 20 | +. .venv/bin/activate |
| 21 | +pip install . |
| 22 | +``` |
| 23 | + |
| 24 | +Development install: |
| 25 | + |
| 26 | +```bash |
| 27 | +pip install -e . |
| 28 | +``` |
| 29 | + |
| 30 | +Run tests: |
| 31 | + |
| 32 | +```bash |
| 33 | +pip install pytest |
| 34 | +pytest tests/unit -v |
| 35 | +``` |
| 36 | + |
| 37 | +Workers require SGLang diffusion support: |
| 38 | + |
| 39 | +```bash |
| 40 | +pip install "sglang[diffusion]" |
| 41 | +``` |
| 42 | + |
| 43 | +## Quick Start |
| 44 | + |
| 45 | +### 1) Start diffusion workers |
| 46 | + |
| 47 | +```bash |
| 48 | +# worker 1 |
| 49 | +CUDA_VISIBLE_DEVICES=0 sglang serve \ |
| 50 | + --model-path stabilityai/stable-diffusion-3-medium-diffusers \ |
| 51 | + --num-gpus 1 \ |
| 52 | + --host 127.0.0.1 \ |
| 53 | + --port 30000 |
| 54 | + |
| 55 | +# worker 2 |
| 56 | +CUDA_VISIBLE_DEVICES=1 sglang serve \ |
| 57 | + --model-path stabilityai/stable-diffusion-3-medium-diffusers \ |
| 58 | + --num-gpus 1 \ |
| 59 | + --host 127.0.0.1 \ |
| 60 | + --port 30001 |
| 61 | +``` |
| 62 | + |
| 63 | +### 2) Start the router |
| 64 | + |
| 65 | +Script entry: |
| 66 | + |
| 67 | +```bash |
| 68 | +sglang-d-router --port 30080 \ |
| 69 | + --worker-urls http://localhost:30000 http://localhost:30001 |
| 70 | +``` |
| 71 | + |
| 72 | +Module entry: |
| 73 | + |
| 74 | +```bash |
| 75 | +python -m sglang_diffusion_routing --port 30080 \ |
| 76 | + --worker-urls http://localhost:30000 http://localhost:30001 |
| 77 | +``` |
| 78 | + |
| 79 | +Or start empty and add workers later: |
| 80 | + |
| 81 | +```bash |
| 82 | +sglang-d-router --port 30080 |
| 83 | +curl -X POST "http://localhost:30080/add_worker?url=http://localhost:30000" |
| 84 | +``` |
| 85 | + |
| 86 | +### 3) Test the router |
| 87 | + |
| 88 | +```bash |
| 89 | +# Check router health |
| 90 | +curl http://localhost:30080/health |
| 91 | + |
| 92 | +# List registered workers |
| 93 | +curl http://localhost:30080/list_workers |
| 94 | + |
| 95 | +# Image generation request (SD3) |
| 96 | +curl -X POST http://localhost:30080/generate \ |
| 97 | + -H "Content-Type: application/json" \ |
| 98 | + -d '{ |
| 99 | + "model": "stabilityai/stable-diffusion-3-medium-diffusers", |
| 100 | + "prompt": "a cute cat", |
| 101 | + "num_images": 1 |
| 102 | + }' |
| 103 | + |
| 104 | +# Video generation request |
| 105 | +curl -X POST http://localhost:30080/generate_video \ |
| 106 | + -H "Content-Type: application/json" \ |
| 107 | + -d '{ |
| 108 | + "model": "stabilityai/stable-video-diffusion", |
| 109 | + "prompt": "a flowing river" |
| 110 | + }' |
| 111 | + |
| 112 | +# Check per-worker health and load |
| 113 | +curl http://localhost:30080/health_workers |
| 114 | +``` |
| 115 | + |
| 116 | +## Router API |
| 117 | + |
| 118 | +- `POST /add_worker`: add worker via query (`?url=`) or JSON body. |
| 119 | +- `GET /list_workers`: list registered workers. |
| 120 | +- `GET /health`: aggregated router health. |
| 121 | +- `GET /health_workers`: per-worker health and active request counts. |
| 122 | +- `POST /generate`: forwards to worker `/v1/images/generations`. |
| 123 | +- `POST /generate_video`: forwards to worker `/v1/videos`. |
| 124 | +- `POST /update_weights_from_disk`: broadcast to healthy workers. |
| 125 | +- `GET|POST|PUT|DELETE /{path}`: catch-all proxy forwarding. |
| 126 | + |
| 127 | +## `update_weights_from_disk` behavior |
| 128 | + |
| 129 | +Full details: [docs/update_weights_from_disk.md](docs/update_weights_from_disk.md) |
| 130 | + |
| 131 | +- The router forwards request payloads as-is to each healthy worker. |
| 132 | +- The router does not validate payload schema; payload semantics are worker-defined. |
| 133 | +- Worker servers must implement `POST /update_weights_from_disk`. |
| 134 | + |
| 135 | +Example: |
| 136 | + |
| 137 | +```bash |
| 138 | +curl -X POST http://localhost:30080/update_weights_from_disk \ |
| 139 | + -H "Content-Type: application/json" \ |
| 140 | + -d '{"model_path": "/path/to/new/weights"}' |
| 141 | +``` |
| 142 | + |
| 143 | +Response shape: |
| 144 | + |
| 145 | +```json |
| 146 | +{ |
| 147 | + "results": [ |
| 148 | + { |
| 149 | + "worker_url": "http://localhost:30000", |
| 150 | + "status_code": 200, |
| 151 | + "body": { |
| 152 | + "ok": true |
| 153 | + } |
| 154 | + } |
| 155 | + ] |
| 156 | +} |
| 157 | +``` |
| 158 | + |
| 159 | +## Benchmark Scripts |
| 160 | + |
| 161 | +Benchmark scripts are available under `tests/benchmarks/diffusion_router/` and are intended for manual runs. |
| 162 | +They are not part of default unit test collection (`pytest tests/unit -v`). |
| 163 | + |
| 164 | +Single benchmark: |
| 165 | + |
| 166 | +```bash |
| 167 | +python tests/benchmarks/diffusion_router/bench_router.py \ |
| 168 | + --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \ |
| 169 | + --num-workers 2 \ |
| 170 | + --num-prompts 20 \ |
| 171 | + --max-concurrency 4 |
| 172 | +``` |
| 173 | + |
| 174 | +Algorithm comparison: |
| 175 | + |
| 176 | +```bash |
| 177 | +python tests/benchmarks/diffusion_router/bench_routing_algorithms.py \ |
| 178 | + --model Wan-AI/Wan2.2-T2V-A14B-Diffusers \ |
| 179 | + --num-workers 2 \ |
| 180 | + --num-prompts 20 \ |
| 181 | + --max-concurrency 4 |
| 182 | +``` |
| 183 | + |
| 184 | +## Project Layout |
| 185 | + |
| 186 | +```text |
| 187 | +. |
| 188 | +├── docs/ |
| 189 | +│ └── update_weights_from_disk.md |
| 190 | +├── src/sglang_diffusion_routing/ |
| 191 | +│ ├── cli/ |
| 192 | +│ └── router/ |
| 193 | +├── tests/ |
| 194 | +│ ├── benchmarks/ |
| 195 | +│ │ └── diffusion_router/ |
| 196 | +│ │ ├── bench_router.py |
| 197 | +│ │ └── bench_routing_algorithms.py |
| 198 | +│ └── unit/ |
| 199 | +├── pyproject.toml |
| 200 | +└── README.md |
| 201 | +``` |
| 202 | + |
| 203 | +## Acknowledgment |
| 204 | + |
| 205 | +This project is derived from [radixark/miles#544](https://github.com/radixark/miles/pull/544). Thanks to the original authors for their work. |
| 206 | + |
| 207 | +## Notes |
| 208 | + |
| 209 | +- Quarantined workers are intentionally not auto-reintroduced. |
| 210 | +- Router responses are fully buffered; streaming passthrough is not implemented. |
0 commit comments