Skip to content

Commit 1d9472b

Browse files
authored
[Feature] Add consistent hashing routing policy for rollout (#1588)
1 parent 0d8642a commit 1d9472b

File tree

7 files changed

+118
-5
lines changed

7 files changed

+118
-5
lines changed

docs/en/advanced/slime-router.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,48 @@ For more details on SGLang Model Gateway, see the [official documentation](https
9191
- Use SlimeRouter when you need R3 or radix-tree caching
9292
- Use SGLang Model Gateway for everything else (recommended default)
9393

94+
---
95+
96+
## 4. Session-Affinity Routing for Multi-Turn Agents
97+
98+
When using SGLang Model Gateway with consistent hashing routing policy, Slime automatically assigns each rollout session a unique session ID and uses it as the routing key to enable session affinity.
99+
100+
### What is session affinity?
101+
102+
Session affinity (also called sticky sessions) ensures that all requests belonging to the same conversation or agent session are routed to the same backend worker. This is beneficial for:
103+
104+
- **Multi-turn dialogues**: Keeping the same worker improves prefix cache hit rates
105+
- **Multi-agent systems**: Ensures agent state consistency and better resource locality
106+
- **Debugging**: Makes it easier to trace and debug specific sessions
107+
108+
### How it works
109+
110+
When the rollout system generates samples, each sample is assigned a unique `session_id`. This ID is:
111+
112+
1. Automatically generated using UUID for each sample
113+
2. Stored in `sample.session_id` field
114+
3. Passed as `X-SMG-Routing-Key` header when the router policy is `consistent_hashing`
115+
116+
The SGLang Model Gateway's consistent hashing policy then uses this routing key to deterministically select the same worker for all requests with the same session ID.
117+
118+
### Configuration
119+
120+
To enable session-affinity routing, simply configure the router policy in Slime:
121+
122+
```bash
123+
--sglang-router-policy consistent_hashing
124+
```
125+
126+
Slime will automatically start SGLang Model Gateway with the consistent hashing policy.
127+
128+
> **Note**: If you encounter an error about the `consistent_hashing` policy not being available, upgrade sglang-router:
129+
> ```bash
130+
> pip install -U sglang-router
131+
> ```
132+
133+
### Notes
134+
135+
- Each sample gets its own unique session ID
136+
- Different samples in the same group may be routed to different workers
137+
- The same sample's subsequent turns will maintain the same session ID
138+
- Currently, this feature is only available for SGLang Model Gateway

docs/zh/advanced/slime-router.md

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -91,3 +91,48 @@ SGLang Model Gateway 是一个高性能 Rust router,面向大规模 inference
9191
- 当你需要 R3 或 radix-tree cache 时,使用 SlimeRouter
9292
- 其他情况使用 SGLang Model Gateway(推荐默认选项)
9393

94+
---
95+
96+
## 4. 多轮 Agent 的会话亲和性路由
97+
98+
当使用 SGLang Model Gateway 的一致性哈希路由策略时,slime 会自动为每个 rollout session 分配唯一的 session ID,并将其作为路由键来实现会话亲和性。
99+
100+
### 什么是会话亲和性?
101+
102+
会话亲和性(Session Affinity,也称为粘性会话)确保属于同一对话或 agent session 的所有请求都被路由到相同的后端 worker。这对以下场景有益:
103+
104+
- **多轮对话**:保持相同 worker 可以提高前缀缓存命中率
105+
- **多 agent 系统**:确保 agent 状态一致性和更好的资源局部性
106+
- **调试**:更容易追踪和调试特定会话
107+
108+
### 工作原理
109+
110+
当 rollout 系统生成样本时,每个样本都会被分配一个唯一的 `session_id`。这个 ID 会:
111+
112+
1. 使用 UUID 为每个样本自动生成
113+
2. 存储在 `sample.session_id` 字段中
114+
3. 当 router policy 为 `consistent_hashing` 时,作为 `X-SMG-Routing-Key` header 传递
115+
116+
SGLang Model Gateway 的一致性哈希策略会使用这个路由键,确定性地为所有具有相同 session ID 的请求选择相同的 worker。
117+
118+
### 配置方法
119+
120+
启用会话亲和性路由只需在 slime 中配置 router policy 参数:
121+
122+
```bash
123+
--sglang-router-policy consistent_hashing
124+
```
125+
126+
Slime 会自动启动 SGLang Model Gateway 并使用一致性哈希策略。
127+
128+
> **注意**:如果遇到 `consistent_hashing` policy 不可用的错误,请升级 sglang-router:
129+
> ```bash
130+
> pip install -U sglang-router
131+
> ```
132+
133+
### 注意事项
134+
135+
- 每个样本都有自己独立的 session ID
136+
- 同一 group 中的不同样本可能会被路由到不同的 worker
137+
- 同一样本的后续轮次会保持相同的 session ID
138+
- 目前该功能只适用于 SGLang Model Gateway

slime/backends/sglang_utils/arguments.py

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,12 @@ def add_sglang_router_arguments(parser):
1919
default=None,
2020
help="Port of the SGLang router",
2121
)
22+
parser.add_argument(
23+
"--sglang-router-policy",
24+
type=str,
25+
default=None,
26+
help="Routing policy for the SGLang router (e.g., 'consistent_hashing', 'round_robin')",
27+
)
2228
parser.add_argument(
2329
"--sglang-router-request-timeout-secs",
2430
type=int,

slime/ray/rollout.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -658,6 +658,9 @@ def _start_router(args):
658658
router_args.log_level = "warn"
659659
router_args.request_timeout_secs = args.sglang_router_request_timeout_secs
660660

661+
if hasattr(args, "sglang_router_policy") and args.sglang_router_policy:
662+
router_args.policy = args.sglang_router_policy
663+
661664
if args.prefill_num_servers is not None:
662665
router_args.pd_disaggregation = True
663666

slime/rollout/sglang_rollout.py

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,7 @@
22
import copy
33
import inspect
44
import logging
5+
import uuid
56
from argparse import Namespace
67
from collections.abc import Callable
78
from contextlib import contextmanager
@@ -157,7 +158,12 @@ async def generate(args: Namespace, sample: Sample, sampling_params: dict[str, A
157158
if not sample.tokens: # Initialize sample.tokens for the first turn
158159
sample.tokens = prompt_ids
159160

160-
output = await post(url, payload)
161+
# Use session_id for consistent hashing routing if router uses consistent_hashing policy
162+
headers = None
163+
if args.sglang_router_policy == "consistent_hashing" and sample.session_id:
164+
headers = {"X-SMG-Routing-Key": sample.session_id}
165+
166+
output = await post(url, payload, headers=headers)
161167

162168
if args.use_slime_router and "RadixTreeMiddleware" in args.slime_router_middleware_paths:
163169
from slime.router.middleware_hub.radix_tree_middleware import postprocess_sample_with_radix_tree
@@ -272,6 +278,11 @@ async def generate_and_rm_group(
272278
if state.aborted:
273279
return group
274280

281+
# Generate a unique session_id for each sample in the group
282+
for sample in group:
283+
if sample.session_id is None:
284+
sample.session_id = str(uuid.uuid4())
285+
275286
tasks = []
276287
for idx, sample in enumerate(group):
277288
current_sampling_params = sampling_params.copy()

slime/utils/http_utils.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -162,12 +162,12 @@ def _next_actor():
162162
return actor
163163

164164

165-
async def _post(client, url, payload, max_retries=60):
165+
async def _post(client, url, payload, max_retries=60, headers=None):
166166
retry_count = 0
167167
while retry_count < max_retries:
168168
response = None
169169
try:
170-
response = await client.post(url, json=payload or {})
170+
response = await client.post(url, json=payload or {}, headers=headers)
171171
response.raise_for_status()
172172
content = await response.aread()
173173
try:
@@ -270,7 +270,7 @@ async def do_post(self, url, payload, max_retries=60):
270270
_post_actors = created
271271

272272

273-
async def post(url, payload, max_retries=60):
273+
async def post(url, payload, max_retries=60, headers=None):
274274
# If distributed mode is enabled and actors exist, dispatch via Ray.
275275
if _distributed_post_enabled and _post_actors:
276276
try:
@@ -285,7 +285,7 @@ async def post(url, payload, max_retries=60):
285285
logger.info(f"[http_utils] Distributed POST failed, falling back to local: {e} (url={url})")
286286
# fall through to local
287287

288-
return await _post(_http_client, url, payload, max_retries)
288+
return await _post(_http_client, url, payload, max_retries, headers=headers)
289289

290290

291291
async def get(url):

slime/utils/types.py

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,9 @@ class Status(Enum):
4545
# metadata used during training, e.g., what loss to use for this sample.
4646
train_metadata: dict | None = None
4747

48+
# Session ID for consistent hashing routing (used when router policy is consistent_hashing)
49+
session_id: str | None = None
50+
4851
non_generation_time: float = 0.0 # time spent in non-generation steps
4952

5053
@dataclass

0 commit comments

Comments
 (0)