Add SGLang-Omni Router for V1 by Ratish1 · Pull Request #401 · sgl-project/sglang-omni

Ratish1 · 2026-05-06T18:12:53Z

Motivation

This PR adds the SGLang-Omni Router for Omni V1. The router is an external HTTP process that sits in front of complete Omni V1 server replicas and routes OpenAI-compatible traffic across worker URLs.

The immediate use case is the router side of the colocation plan in #376: one full Omni V1 replica per server/GPU, then client traffic enters through one router endpoint. This PR intentionally implements only the router. Colocated Qwen3-Omni server placement and the H20 end-to-end CI lane remain separate integration work.

RFC: Router Contract

Scope

The router owns replica-level HTTP routing:

client
  -> sgl-omni-router
    -> complete Omni V1 server replica A
    -> complete Omni V1 server replica B
    -> complete Omni V1 server replica N

Each worker is an opaque base URL. The router does not inspect the Omni V1 pipeline graph, does not route individual stages, and does not mutate request JSON for DP rank or topology-aware behavior.

Public Interface

Adds the dedicated console command sgl-omni-router.
Keeps python -m sglang_omni_router.serve usable for direct module execution.
Uses canonical router arguments such as --worker-urls, --policy, --request-timeout-secs, --max-payload-size, health thresholds, and health probe timing.
Supports underscore-only policy names: round_robin, least_request, and random.
Does not add a secondary sgl-omni router command.

Worker Model

Workers are complete Omni V1 HTTP server replicas. Each worker tracks:

normalized URL and stable URL-encoded worker id
optional model name
declared capability set
active request count
health state: unknown, healthy, unhealthy, or dead
manual disabled state
consecutive health success and failure counters
last health status, error, and check timestamp

Routing eligibility is:

worker.health_state == healthy and worker.disabled == false

Dead workers are quarantined from routing. Recovery is explicit through the worker update API, followed by a health probe before the worker becomes routable again.

Request Lifecycle

For model requests, the router:

receives the FastAPI request
checks Content-Length against max_payload_size
reads the body once as bytes
rejects oversized bodies
parses small JSON bodies only for route metadata
preserves the original request body bytes for upstream forwarding
infers required capabilities from endpoint and metadata
filters routable workers by required capabilities
applies the selected policy
increments the selected worker active request count
forwards the request with hop-by-hop headers stripped
relays the upstream status, headers, and body
adds router diagnostic response headers
decrements active request count during cleanup
emits route diagnostics when route logging is configured

Streaming responses use httpx.AsyncClient.send(..., stream=True) and relay upstream.aiter_bytes() without parsing, buffering, or synthesizing SSE frames.

Supported Routes

The first router surface is intentionally explicit:

GET /v1/models
POST /v1/chat/completions
POST /v1/audio/speech
GET /live
GET /ready
GET /health
GET /workers
GET /workers/{worker_id}
POST /workers
PUT /workers/{worker_id}
DELETE /workers/{worker_id}

There is no catch-all proxy in this PR. New proxied routes should be added only when they correspond to validated Omni V1 backend endpoints.

Capability Routing

Default workers advertise the complete Omni V1 replica capability set:

chat
speech
streaming
image_input
audio_input
video_input
audio_output

The router infers required capabilities from the current V1 request shape. Chat requests always require chat, add streaming when stream=true, and add modality capabilities for image, audio, video, and chat audio-output fields. Speech requests require speech and add streaming for streaming speech.

Routing Policies

round_robin: default policy and best first CI policy because it deterministically exercises every eligible worker.
least_request: selects the minimum active-request group and round-robins among tied workers.
random: diagnostic policy.

Active request counts include streaming requests until the stream generator exits.

Health And Worker Management

Health uses active probes against the configured health endpoint. Consecutive failures mark a worker dead after the configured threshold. Dead workers are skipped by later health probes until explicitly recovered.

Worker CRUD is available for trusted internal deployments:

add new workers at runtime
inspect the worker pool
disable workers without losing health state
mark workers dead
clear dead state and immediately reprobe
delete workers from the routing pool

Observability

The router exposes pool health through /health and detailed worker state through /workers. Route diagnostics are optional and include selected worker URL/id, policy, required capabilities, worker health state, disabled state, routability, request id, status code, byte counts, duration, and streaming completion state.

Route logging is best-effort. A route-log write failure is logged and does not fail the proxied request.

Modifications

Adds the top-level sglang_omni_router package with focused modules for config validation, worker state, active health probing, worker selection, proxying, FastAPI app wiring, and serve entrypoint.
Adds sgl-omni-router as the dedicated public console command.
Removes the router path from the existing sgl-omni CLI surface so router help does not depend on unrelated Omni client imports.
Adds strict worker URL normalization and validation for HTTP(S) base URLs.
Adds worker health state, dead-worker quarantine, manual disable, active request accounting, and runtime worker CRUD.
Adds modality-aware candidate filtering while forwarding original request bytes unchanged.
Adds exact streaming byte relay for chat and speech streaming responses.
Adds /v1/models aggregation across routable workers with query/header preservation and per-worker failure details when all eligible reads fail.
Adds route diagnostics and non-fatal route-log writing for operator and CI visibility.
Adds router unit and app tests covering config validation, health lifecycle, worker CRUD, policies, modality routing, raw body forwarding, streaming relay, model aggregation, route diagnostics, and cleanup on upstream failures.

Related Issues

Related to #376.

Accuracy Test

In progress.

Benchmark & Profiling

In progress.

CI

In progress.

Ratish1 added 13 commits May 6, 2026 19:24

Add Omni router core state and selection

7168524

Add Omni router proxy app and CLI

6c46cd7

Document Omni router usage and CLI coverage

b93f7c1

Tighten Omni router test coverage

b4cffbc

Refactor Omni router core structure

04776cb

Use serve router entrypoint

1a8d946

Move Omni router server entrypoint

b217130

Rename Omni router entrypoint to serve

2e84c8e

Add dedicated Omni router command

46753cc

Add Omni router worker management

f0f6ef5

Add modality-aware Omni routing

8662fbc

Improve Omni router model diagnostics

d96895e

Harden Omni router request lifecycle

c430eec

Ratish1 changed the title ~~Add SGLang-Omni Router for V1 replicas~~ Add SGLang-Omni Router for V1 May 6, 2026

Ratish1 force-pushed the feat/omni-router-v1 branch from 8f063d6 to c430eec Compare May 6, 2026 18:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add SGLang-Omni Router for V1#401

Add SGLang-Omni Router for V1#401
Ratish1 wants to merge 13 commits intosgl-project:mainfrom
Ratish1:feat/omni-router-v1

Ratish1 commented May 6, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Ratish1 commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

RFC: Router Contract

Scope

Public Interface

Worker Model

Request Lifecycle

Supported Routes

Capability Routing

Routing Policies

Health And Worker Management

Observability

Modifications

Related Issues

Accuracy Test

Benchmark & Profiling

CI

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Ratish1 commented May 6, 2026 •

edited

Loading