Skip to content

feat: expose quantization and kv_cache_dtype in server args builder#246

Open
ZhitongGuo wants to merge 2 commits intosgl-project:mainfrom
ZhitongGuo:feat/quantization-passthrough
Open

feat: expose quantization and kv_cache_dtype in server args builder#246
ZhitongGuo wants to merge 2 commits intosgl-project:mainfrom
ZhitongGuo:feat/quantization-passthrough

Conversation

@ZhitongGuo
Copy link
Copy Markdown

@ZhitongGuo ZhitongGuo commented Mar 31, 2026

Motivation

SGLang natively supports model quantization (INT8, FP8, AWQ, GPTQ) and KV cache dtype configuration via ServerArgs, but sglang-omni's helper functions (build_sglang_server_args and create_sglang_tts_engine_executor) did not expose these parameters as named arguments, making them undiscoverable for users.

Modifications

  • server_args_builder.py: Added quantization and kv_cache_dtype as explicit named parameters to build_sglang_server_args(). They are conditionally inserted into the kwargs dict before **overrides, preserving backward compatibility.
  • stages.py: Added the same two parameters to create_sglang_tts_engine_executor() for the S2-Pro TTS pipeline, with conditional passthrough to ServerArgs.
  • tests/test_quantization_passthrough.py: Added 4 unit tests verifying argument passthrough for both named params and the existing **overrides path.

Accuracy Test

  • All 3 modified/created files pass syntax validation.
  • Unit tests verify: named quantization="awq" passthrough, default None, kv_cache_dtype="fp8_e5m2" passthrough, and **overrides path for quantization="gptq".

Benchmark & Profiling

No performance impact — this is a configuration passthrough change. Actual quantization performance depends on the underlying SGLang engine and model.

ZhitongGuo and others added 2 commits March 31, 2026 15:33
SGLang natively supports quantization (INT8, FP8, AWQ, GPTQ) via
ServerArgs, but sglang-omni's helper functions did not expose these
params, making them undiscoverable.

- Add `quantization` and `kv_cache_dtype` named parameters to
  `build_sglang_server_args()` so callers can set them without
  resorting to **overrides
- Add `quantization` parameter to S2-Pro TTS engine factory
  (`create_sglang_tts_engine_executor`) and pass through to ServerArgs
- Add unit tests for all passthrough paths
@ZhitongGuo ZhitongGuo marked this pull request as ready for review March 31, 2026 23:18
@ZhitongGuo ZhitongGuo requested a review from shuaills as a code owner March 31, 2026 23:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant