docs(moss-tts): document MOSS-TTS-Local variant in cookbook#782
Open
xinlij wants to merge 1 commit into
Open
Conversation
Add the MOSS-TTS-Local-Transformer-v1.5 variant to the Prerequisites and Server Configuration sections of the MOSS-TTS cookbook: download command, the remote-code codec note (MOSS-Audio-Tokenizer-v2, 48 kHz output), the two-GPU default launch (AR engine on cuda:0, codec on cuda:1), and the single-GPU colocated config option.
18542bf to
93fe10c
Compare
gaoyang07
reviewed
Jun 14, 2026
| @@ -1,2 +1,2 @@ | |||
| # MOSS-TTS | |||
|
|
|||
Collaborator
There was a problem hiding this comment.
For readability and discoverability, we might consider splitting this section into two dedicated subsections from the beginning: one for the Delay Pattern model and one for the Local Transformer model. Each subsection could describe the corresponding architecture, serving pipeline, hardware assumptions, and usage examples. This would make the tradeoff between the two token modeling patterns more explicit and avoid mixing model-specific details in a single flow.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds the MOSS-TTS-Local-Transformer-v1.5 variant to the MOSS-TTS cookbook (
docs/cookbook/moss_tts.md), which previously only documented the delay model (MOSS-TTS-v1.5).Prerequisites
hf download OpenMOSS-Team/MOSS-TTS-Local-Transformer-v1.5command.MOSS-Audio-Tokenizer-v2, 48 kHz output) is loaded via the checkpoint's remote code, sotrust_remote_codemust be enabled.Server Configuration
examples/configs/moss_tts_local.yamland its two-GPU default (AR engine oncuda:0, codec oncuda:1).config_cls: MossTTSLocalColocatedPipelineConfig.The request shape and generation parameters are identical to the delay model, so the existing synthesis examples apply to both variants. Note the output sample rate differs (Local 48 kHz vs. delay 24 kHz).
Verified against the codebase
sglang_omni/models/moss_tts_local/sglang_model.py.codec_device="cuda:1") vs. colocated (codec_device="cuda:0"):sglang_omni/models/moss_tts_local/config.py.config_cls: MossTTSLocalColocatedPipelineConfigresolves via theVariantslookup insglang_omni/models/registry.py(get_config_cls_by_name).sglang_omni/models/moss_tts_local/payload_types.py(sample_rate = 48000).