Release Gateway-v0.2.3
π SGLang Model Gateway - New Release!
We're excited to announce another powerful update to SGLang Model Gateway with performance improvements and expanded database support!
β¨ Headline Features
β‘ Bucket Mode Routing - 20-30% Performance Boost
Introducing our new bucket-based routing algorithm that dramatically improves performance in PD mode. See up to 20-30% improvements in TTFT (Time To First Token) and overall throughput
πΎ PostgreSQL Support for Chat History Management
Flexibility in data storage! We now support PostgreSQL alongside OracleDB and in-memory storage for chat history management.
π οΈ Enhanced Model Tool & Structured Output Support
- MinMax M2 model support!
- Structured model output for OpenAI and gRPC router
- Streaming parsing with Tool Choice in chat completions API
- Tool_choice support for Responses API
- OutputItemDone events with output item array storage for better observability
π Stability & Quality Improvements
Multiple bug fixes for model validation, streaming logic, reasoning content indexing, and CI stability enhancements.
π§ Code Quality Enhancements
Refactored builders for chat and responses, restructured modules for better maintainability, and consolidated error handling.
Try the latest version: pip install sglang-router --upgrade
What's Changed in Gateway
Gateway Changes (45 commits)
- [model-gateway] smg release 0.2.3 (#13312) by @slin1237 in #13312
- [router]Replace requests lib with openai in e2e_response_api (#13293) by @XinyueZhang369 in #13293
- fix outdated router doc (#13255) by @fzyzcjy in #13255
- [router][grpc] Refine docs in minimax_m2 to match other parsers (#13218) by @CatherineSue in #13218
- fix: display served_model_name in /v1/models (#13155) by @Sunhaihua1 in #13155
- [router] minmax-m2 xml tool parser (#13148) by @slin1237 in #13148
- [router] remove worker url requirement (#13172) by @slin1237 in #13172
- [router] Fix Flaky test_circuit_breaker_opens_and_recovers (#13164) by @XinyueZhang369 in #13164
- [router] Add comprehensive validation to Responses API (#13127) by @key4ng in #13127
- bugfix: multi-model routing for /generate api (#12979) by @SYChen123 in #12979
- [router][grpc] Support vllm backend for grpc router (#13120) by @CatherineSue in #13120
- [router] add minmax m2 reasoning parser (#13137) by @slin1237 in #13137
- [router] Support complex assistant and tool messages in /chat/completions (#12860) by @hellodanylo in #12860
- [router] move radix tree to policy crate and addreses some code styles (#13131) by @slin1237 in #13131
- [Router] use call_id instead of id for matching function calls in Responses API for Harmony (#13056) by @zhaowenzi in #13056
- Revert "fix: display served_model_name in /v1/models" (#13093) by @CatherineSue in #13093
- fix: display served_model_name in /v1/models (#13063) by @Sunhaihua1 in #13063
- [router] add postgres databases data connector (#12218) by @lengrongfu in #12218
- [router][ci] Quick Improvement to make CI more stable (#12869) by @key4ng in #12869
- [router][ci] Fix maturin build (#13012) by @key4ng in #13012
- [router] bucket policy (#11719) by @syy-hw in #11719
- [router] Switch MCP tests from DeepWiki to self-hosted Brave search server (#12849) by @key4ng in #12849
- [router][grpc] Move all error logs to their call sites (#12859) by @CatherineSue in #12859
- [router][grpc] Refactor: Add builders for chat and responses (#12852) by @CatherineSue in #12852
- [router] Support structured model output for openai and grpc router (#12431) by @key4ng in #12431
- [router][grpc] Add more mcp test cases to responses api (#12749) by @CatherineSue in #12749
- fix ci (#12760) by @key4ng in #12760
- Add timing metrics for requests (#12646) by @cicirori in #12646
- [router][ci] Disable cache (#12752) by @key4ng in #12752
- [router][grpc] Support mixin tool calls in Responses API (#12736) by @CatherineSue in #12736
- Revert "[router] web_search_preview tool basic implementation" (#12716) by @key4ng in #12716
- [router] add basic ci tests for gpt-oss model support (#12651) by @key4ng in #12651
- [router][quick fix] Add minimal option for reasoning effort in spec (#12711) by @key4ng in #12711
- [router][grpc] Make harmony parser checks recipient first before channel (#12713) by @CatherineSue in #12713
- [router][ci] speed up python binding to 1.5 min (#12673) by @key4ng in #12673
- [router] fix: validate HTTP status codes in health check (#12631) by @wyx-0203 in #12631
- [router][grpc] Support streaming parsing with Tool Choice in chat completions API (#12677) by @CatherineSue in #12677
- [router][grpc] Implement tool_choice support for Responses API (#12668) by @CatherineSue in #12668
- [router][grpc] Emit OutputItemDone event and store output item array (#12656) by @CatherineSue in #12656
- [router][grpc] Fix index issues in reasoning content and missing streaming events (#12650) by @CatherineSue in #12650
- [router][grpc] Fix model validation, tool call check, streaming logic and misc in responses (#12616) by @CatherineSue in #12616
- Support aggregating engine metrics in sgl-router (#11456) by @fzyzcjy in #11456
- [router][grpc] Restructure modules and code clean up (#12598) by @CatherineSue in #12598
- [router][grpc] Consolidate error messages build in error.rs (#12301) by @CatherineSue in #12301
- [ci] install released version router (#12410) by @key4ng in #12410
New Contributors
- @XinyueZhang369 made their first contribution in 2cdde3d46
- @Sunhaihua1 made their first contribution in a06c44f90
- @zhaowenzi made their first contribution in 7b877ab83
- @cicirori made their first contribution in 58095cb00
- @wyx-0203 made their first contribution in 3651cfbf6
- @syy-hw made their first contribution in 611a4fd08
- @SYChen123 made their first contribution in 4ef439054
- @hellodanylo made their first contribution in d28caaf60
Paths Included
sgl-routerpython/sglang/srt/grpcpython/sglang/srt/entrypoints/grpc_server.py
Full Changelog: gateway-v0.2.2...gateway-v0.2.3