[CLI] Expose --cpu-offload-gb, --tp-size, and --mem-fraction-static on sgl-omni by edwingao28 · Pull Request #308 · sgl-project/sglang-omni

edwingao28 · 2026-04-17T04:47:53Z

Motivation

sgl-omni serve only exposed high-level options; common SGLang runtime settings like cpu_offload_gb, mem_fraction_static, and tp_size required editing pipeline config YAML by hand. This blocks Ming-flash-omni-2.0 launches, where the generated defaults (cpu_offload_gb=0, mem_fraction_static=0.7) are often too tight, and blocks multi-GPU runs that need tp_size>1.

4.22 put this PR on hold since we are facing major codebase refactor, will apply changes after refactored code pushed to main branch

Modifications

sglang_omni/cli/serve.py: Add --cpu-offload-gb, --mem-fraction-static, and --tp-size; pass as server_args_overrides via ConfigManager.from_model_path or --config.
sglang_omni/config/schema.py: Add server_args_overrides and _apply_server_args_overrides to PipelineConfig; route via primary_sglang_stage.
sglang_omni/config/manager.py: Plumb server_args_overrides through ConfigManager.from_model_path.
sglang_omni/models/ming_omni/config.py: Set primary_sglang_stage = THINKER_STAGE; auto-inject disable_custom_all_reduce=True when tp_size>1 (Ming custom all-reduce kernel hangs under TP); remove legacy local override logic.

Example

TP=1:

CUDA_VISIBLE_DEVICES=0 \
sgl-omni serve \
  --model-path inclusionAI/Ming-flash-omni-2.0 \
  --port 8000 \
  --model-name ming-omni \
  --cpu-offload-gb 80 \
  --mem-fraction-static 0.92

TP=2 (Ming-Omni):

CUDA_VISIBLE_DEVICES=0,1 \
sgl-omni serve \
  --model-path inclusionAI/Ming-flash-omni-2.0 \
  --port 8000 \
  --model-name ming-omni \
  --tp-size 2 \
  --cpu-offload-gb 0 \
  --mem-fraction-static 0.80

Related Issues

Closes #296

Checklist

Format your code according with pre-commit.
Add unit tests.
Update documentation / docstrings / example tutorials as needed.
Provide throughput / latency benchmark results and accuracy evaluation results as needed.
For reviewers: If you haven't made any contributions to this PR and are only assisting with merging the main branch, please remove yourself as a co-author when merging the PR.

… sgl-omni serve

edwingao28 changed the title ~~[CLI] Expose --cpu-offload-gb and --mem-fraction-static on sgl-omni s…~~ [CLI] Expose --cpu-offload-gb and --mem-fraction-static on sgl-omni Apr 17, 2026

edwingao28 force-pushed the feat/cli-runtime-overrides-a branch from 16f795f to ee90710 Compare April 21, 2026 00:32

edwingao28 marked this pull request as ready for review April 21, 2026 00:33

edwingao28 requested review from FrankLeeeee and shuaills as code owners April 21, 2026 00:33

edwingao28 force-pushed the feat/cli-runtime-overrides-a branch from ee90710 to 53a612b Compare April 21, 2026 00:39

Jayon02 mentioned this pull request Apr 21, 2026

[Benchmark] Add Video-MME for Qwen3-Omni #327

Merged

5 tasks

edwingao28 changed the title ~~[CLI] Expose --cpu-offload-gb and --mem-fraction-static on sgl-omni~~ [CLI] Expose --cpu-offload-gb, --tp-size, and --mem-fraction-static on sgl-omni Apr 21, 2026

edwingao28 force-pushed the feat/cli-runtime-overrides-a branch from 53a612b to 6ae047c Compare April 22, 2026 01:26

[CLI] Expose --cpu-offload-gb, --tp-size, and --disable-cuda-graph on…

f2703c2

… sgl-omni serve

edwingao28 force-pushed the feat/cli-runtime-overrides-a branch from 6ae047c to f2703c2 Compare April 22, 2026 01:27

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CLI] Expose --cpu-offload-gb, --tp-size, and --mem-fraction-static on sgl-omni#308

[CLI] Expose --cpu-offload-gb, --tp-size, and --mem-fraction-static on sgl-omni#308
edwingao28 wants to merge 1 commit intosgl-project:mainfrom
edwingao28:feat/cli-runtime-overrides-a

edwingao28 commented Apr 17, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

edwingao28 commented Apr 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Modifications

Example

Related Issues

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

edwingao28 commented Apr 17, 2026 •

edited

Loading