feat: expose quantization and kv_cache_dtype in server args builder by ZhitongGuo · Pull Request #246 · sgl-project/sglang-omni

ZhitongGuo · 2026-03-31T23:06:02Z

Motivation

SGLang natively supports model quantization (INT8, FP8, AWQ, GPTQ) and KV cache dtype configuration via ServerArgs, but sglang-omni's helper functions (build_sglang_server_args and create_sglang_tts_engine_executor) did not expose these parameters as named arguments, making them undiscoverable for users.

Modifications

server_args_builder.py: Added quantization and kv_cache_dtype as explicit named parameters to build_sglang_server_args(). They are conditionally inserted into the kwargs dict before **overrides, preserving backward compatibility.
stages.py: Added the same two parameters to create_sglang_tts_engine_executor() for the S2-Pro TTS pipeline, with conditional passthrough to ServerArgs.
tests/test_quantization_passthrough.py: Added 4 unit tests verifying argument passthrough for both named params and the existing **overrides path.

Accuracy Test

All 3 modified/created files pass syntax validation.
Unit tests verify: named quantization="awq" passthrough, default None, kv_cache_dtype="fp8_e5m2" passthrough, and **overrides path for quantization="gptq".

Benchmark & Profiling

No performance impact — this is a configuration passthrough change. Actual quantization performance depends on the underlying SGLang engine and model.

SGLang natively supports quantization (INT8, FP8, AWQ, GPTQ) via ServerArgs, but sglang-omni's helper functions did not expose these params, making them undiscoverable. - Add `quantization` and `kv_cache_dtype` named parameters to `build_sglang_server_args()` so callers can set them without resorting to **overrides - Add `quantization` parameter to S2-Pro TTS engine factory (`create_sglang_tts_engine_executor`) and pass through to ServerArgs - Add unit tests for all passthrough paths

ZhitongGuo and others added 2 commits March 31, 2026 15:33

Merge branch 'main' into feat/quantization-passthrough

cfae3dc

ZhitongGuo marked this pull request as ready for review March 31, 2026 23:18

ZhitongGuo requested a review from shuaills as a code owner March 31, 2026 23:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: expose quantization and kv_cache_dtype in server args builder#246

feat: expose quantization and kv_cache_dtype in server args builder#246
ZhitongGuo wants to merge 2 commits intosgl-project:mainfrom
ZhitongGuo:feat/quantization-passthrough

ZhitongGuo commented Mar 31, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ZhitongGuo commented Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Accuracy Test

Benchmark & Profiling

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ZhitongGuo commented Mar 31, 2026 •

edited

Loading