Skip to content

Release Gateway-v0.3.0

Choose a tag to compare

@slin1237 slin1237 released this 24 Dec 22:00
· 508 commits to main since this release
5454d2a

πŸš€ SGLang Model Gateway v0.3.0 Released!

We're thrilled to announce SGLang Model Gateway v0.3.0 – a major release with powerful new features, architectural improvements, and important breaking changes!

⚠️ Breaking Changes

πŸ“Š Metrics Architecture Redesigned

Complete overhaul with new 6-layer metrics architecture covering protocol (HTTP/gRPC), router, worker, streaming (TTFT/TPOT), circuit breaker, and policy metrics with unified error codes.
Action Required: Update your Prometheus dashboards and alerting rules. Metric names and structure have changed.

πŸ”§ UUID-Based Worker Resource Management

Workers are now identified by UUIDs instead of endpoints for cleaner resource management.
Action Required: Update any tooling or scripts that interact with the worker API.

✨ New Features

🌐 Unified Inference Gateway Mode (IGW)

Single gateway, entire fleet. IGW now supports ALL router types in a single deployment with Kubernetes service discovery:

  • gRPC router (PD and regular mode)
  • HTTP router (PD and regular mode)
  • OpenAI router
    Auto-enabled with service discovery. Deploy once, route everything - handle all traffic patterns across your entire inference fleet from a single gateway instance.

πŸ”€ Tokenize/Detokenize HTTP Endpoints

  • Direct HTTP endpoints for tokenization operations
  • Dynamic tokenizer control plane: add, list, get, and remove tokenizers on-the-fly
  • TokenizerRegistry for efficient dynamic loading

🧠 Parser Endpoints

  • /parse/reasoning - Parse reasoning outputs
  • /parse/function_call - Parse function call responses
  • GLM-4 function call parser - Contributed directly by the GLM team for latest GLM models

πŸ“Š Embeddings Support

Native embeddings endpoint for gRPC router - expand beyond text generation to embedding workloads.

πŸ” Server-Side TLS Support

Secure your gateway deployments with native TLS support.

🌐 Go Implementation, contributed by iFlytek MaaS team.

Complete Go SGLang Model Gateway with OpenAI-compatible API server - bringing SGLang to the Go ecosystem!

⚑ Major Enhancements

Control Plane - Workflow Engine

Intelligent lifecycle orchestration with:

  • DAG-based parallel execution with pre-computed dependency graphs
  • Concurrent event processing for maximum throughput
  • Modular add/remove/update workflows

Performance Optimization

  • Lock-free data structures: DashMap for policy lookups, lock-free router snapshots
  • Reduced CPU overhead: Optimized worker registry, gRPC client fetch, and worker selection
  • Optimized router management: Improved selection algorithms and state management

Resilience & Reliability:

  • Retry and circuit breaker support for OpenAI and gRPC routers
  • Enhanced circuit breaker with better state management
  • Graceful shutdown for TLS and non-TLS servers
  • Unified error responses with error codes and X-SMG-Error-Code headers

Infrastructure:

  • Multi-architecture Docker builds (Linux, macOS, Windows, ARM)
  • Custom Prometheus duration buckets
  • Improved logging across all modules

πŸ› Bug Fixes & Stability

  • Fixed cache-aware routing in gRPC mode
  • Resolved load metric tracking and double-decrease issues for cache aware load balancing
  • Improved backward compatibility for GET endpoints
  • Fixed gRPC scheduler launcher issues
  • Fixed token bucket negative duration panics
  • Resolved MCP server initialization issues

πŸ“š Documentation

Major documentation update with comprehensive guides, examples, and best practices for SGLang Model Gateway.

⚠️ Migration checklist:

  • Update Prometheus dashboards for new metrics
  • Update worker API integrations for UUID-based management
  • Review new error response format

⚑ Built for speed. Engineered for scale. Production-proven.

Gateway Changes (108 commits)

New Contributors

Full Changelog: gateway-v0.2.4...gateway-v0.3.0