docs(moss-tts): document MOSS-TTS-Local variant in cookbook by xinlij · Pull Request #782 · sgl-project/sglang-omni

xinlij · 2026-06-14T05:25:59Z

Summary

Adds the MOSS-TTS-Local-Transformer-v1.5 variant to the MOSS-TTS cookbook (docs/cookbook/moss_tts.md), which previously only documented the delay model (MOSS-TTS-v1.5).

Prerequisites

Distinguishes the two checkpoints (single-GPU delay model vs. two-GPU Local variant).
Adds the hf download OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5 command.
Notes the codec (MOSS-Audio-Tokenizer-v2, 48 kHz output) is loaded via the checkpoint's remote code, so trust_remote_code must be enabled.

Server Configuration

Splits into two subsections, one per checkpoint.
Documents the Local launch with examples/configs/moss_tts_local.yaml and its two-GPU default (AR engine on cuda:0, codec on cuda:1).
Notes the single-GPU option via config_cls: MossTTSLocalColocatedPipelineConfig.

The request shape and generation parameters are identical to the delay model, so the existing synthesis examples apply to both variants. Note the output sample rate differs (Local 48 kHz vs. delay 24 kHz).

Verified against the codebase

Architecture (36-layer Qwen3 backbone + 1-layer frame-local transformer): sglang_omni/models/moss_tts_local/sglang_model.py.
Two-GPU default (codec_device="cuda:1") vs. colocated (codec_device="cuda:0"): sglang_omni/models/moss_tts_local/config.py.
config_cls: MossTTSLocalColocatedPipelineConfig resolves via the Variants lookup in sglang_omni/models/registry.py (get_config_cls_by_name).
48 kHz output: sglang_omni/models/moss_tts_local/payload_types.py (sample_rate = 48000).

Add the MOSS-TTS-Local-Transformer-v1.5 variant to the Prerequisites and Server Configuration sections of the MOSS-TTS cookbook: download command, the remote-code codec note (MOSS-Audio-Tokenizer-v2, 48 kHz output), the two-GPU default launch (AR engine on cuda:0, codec on cuda:1), and the single-GPU colocated config option.

gaoyang07 · 2026-06-14T06:04:31Z

@@ -1,2 +1,2 @@
 # MOSS-TTS



For readability and discoverability, we might consider splitting this section into two dedicated subsections from the beginning: one for the Delay Pattern model and one for the Local Transformer model. Each subsection could describe the corresponding architecture, serving pipeline, hardware assumptions, and usage examples. This would make the tradeoff between the two token modeling patterns more explicit and avoid mixing model-specific details in a single flow.

xinlij requested review from yuan-luo and zhaochenyang20 June 14, 2026 05:30

xinlij force-pushed the docs/moss-tts-local-cookbook branch from 18542bf to 93fe10c Compare June 14, 2026 05:41

xinlij removed the request for review from yuan-luo June 14, 2026 05:50

gaoyang07 reviewed Jun 14, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(moss-tts): document MOSS-TTS-Local variant in cookbook#782

docs(moss-tts): document MOSS-TTS-Local variant in cookbook#782
xinlij wants to merge 1 commit into
mainfrom
docs/moss-tts-local-cookbook

xinlij commented Jun 14, 2026 •

edited

Loading

Uh oh!

gaoyang07 Jun 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

xinlij commented Jun 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Prerequisites

Server Configuration

Verified against the codebase

Uh oh!

gaoyang07 Jun 14, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

xinlij commented Jun 14, 2026 •

edited

Loading