Skip to content

Release Gateway-v0.2.1

Choose a tag to compare

@slin1237 slin1237 released this 17 Nov 11:13
· 2599 commits to main since this release
8a801ee

🚀 SGLang Model Gateway v0.2.1 Released!

This release focuses on stability, cleanup, and two big new performance features.

🧾 Docs & CI

  • Updated router documentation to reflect recent feature additions

🧹 Code Cleanup

  • Refactored StopSequenceDecoder for cleaner incremental decoding
  • Added spec.rs test harness under spec/ for structured unit tests

🐞 Bug Fixes

  • Fixed UTF-8 boundary in stop-sequence decoding
  • Fixed gRPC timeout configuration
  • Fixed worker filtering, tool-choice normalization, and bootstrap-port handling
  • Additional gRPC server warm-up and concurrency fixes

🌟 New Features

  • Two-Level Tokenizer Caching (L0 + L1)
  • L0: exact-match cache for repeated prompts
  • L1: prefix-aware cache at special-token boundaries
  • OpenAI-Style Classification API → new /v1/classifications endpoint, shout out to yanbo for the contribution
  • Worker Management Workflow Engine → improved async registration, worker self discovery, and health orchestration

What's Changed in Gateway

Gateway Changes (26 commits)

Paths Included

  • sgl-router
  • python/sglang/srt/grpc
  • python/sglang/srt/entrypoints/grpc_server.py

Full Changelog: gateway-v0.2.0...gateway-v0.2.1