Skip to content

Release Gateway-v0.3.1

Choose a tag to compare

@slin1237 slin1237 released this 09 Jan 06:18
· 1087 commits to main since this release
7460240

πŸš€ SMG v0.3.1 Released!

We're excited to announce SMG v0.3.1 – a game-changing release with 10-12x performance improvement and 99% memory reduction in cache-aware routing, plus enterprise-grade security!

🌲 Radix Tree / Cache-Aware Routing: 10-12x Faster + 99% Less Memory ⚑

Complete optimization overhaul of our cache-aware routing engine with stunning performance and memory gains:

Performance Improvements

  • Our cache-aware routing can now handle over 216,000 cache insertions per second (up from 18,900), with latency dropping from 52.9 microseconds to just 4.6 microseconds per operation.
  • For prefix matching across 10,000 tree entries, throughput jumped from 41,000 to 124,000 operations per second.
  • Under concurrent load with 64 threads, the system processes 474,000 operations per second – a 7.9x improvement over the previous 59,000 ops/sec.

Data processing

  • INSERT operations now process 440 MB/s (up from 38 MB/s),
  • MATCH operations handle 253 MB/s (up from 83 MB/s).

Memory Improvements:

  • ~99% memory reduction per tree node:
  • Before: ~180 KB per node (DashMap default config on 170-core machines)
  • After: ~1.4 KB per node
    Result: Deploy 100x more cache entries in the same memory footprint!
    For a typical deployment with 10,000 cached prefixes, memory usage drops from ~1.8 GB to just ~14 MB – freeing up resources for actual inference workloads.
    Impact: Cache-aware routing is now 10-12x faster and uses 99% less memory. This is critical for large-scale multi-tenant deployments.

πŸ” JWT/OIDC Authentication

Production-grade security for control plane APIs with native support for industry-standard OIDC providers: Google, Azure, Oracle, GitHub, and more. Protect tokenizer management, worker registration, and admin endpoints with enterprise authentication infrastructure you already use. Critical for enterprise deployments – seamlessly integrate SMG into your existing identity and access management systems.

πŸ“Š Classification API Support

Native support for classification workloads! Deploy and serve classification models alongside your existing inference fleet with dedicated pipeline stages and protocol types.

✨ Additional Features

  • PrefixHash Load Balancing: New KV cache-aware load balancing policy using prefix hashing for improved cache hit rates in multi-tenant environments.
  • Nemotron Nano V3 Parser
  • In-Flight Request Age Metrics: Track request age in-flight for better observability and SLA monitoring.

πŸ› οΈ Enhancements

Developer Experience:

  • Organized CLI arguments into logical groups
  • Shortened logging targets (sgl_model_gateway β†’ smg)
  • Comprehensive embedding correctness tests against HuggingFace
  • Auto-generate protobuf files during wheel build

Reliability:

  • Fix IGW routing for external OpenAI workers
  • Work around orphan process problems
  • Prevent potential hangs in subprocess handling
  • Use 504 Gateway Timeout for upstream timeouts (proper HTTP semantics)

πŸ› Bug Fixes

  • Fixed embedding worker health check crash
  • Fixed tokenizer to match transformers special token handling
  • Fixed age bucket rendering issue
  • Fixed non-PD router HTTP header whitelist
  • Fixed duplicate classify prefix in response ID
  • Fixed WASM test errors on machines with many cores

⚑ Built for speed. Engineered for scale. Production-proven.

Gateway Changes (120 commits)

New Contributors

Full Changelog: gateway-v0.3.0...gateway-v0.3.1