Skip to content

Strategy for "1-Hour Ultra-long Speech" & Best Practices for Max Training Sequence Length #75

@omarabb315

Description

@omarabb315

Hello MOSS-TTS team,

I am currently fully fine-tuning MOSS-TTS-8B on a 5,000-hour Arabic dataset using. My goal is to create a foundational, highly fluent Arabic TTS model with robust zero-shot cloning.

My Journey & Problem:
Initially, I trained on 2 to 30-second clips, but I encountered the classic issue where the model would clip early and stop generating before finishing longer text prompts.
To fix this, I stitched my dataset into longer paragraphs (randomized from 2 seconds up to 200 seconds). I am training with batch_size=1, gradient_accumulation_steps=128, and truncating max_seq_len at 10,000 tokens to avoid OOM on the H200.

This partially solved the early clipping (still clip at ~50s)!

My Questions regarding your "1 Hour" generation claim:
In the README, it states the model supports "continuous long-form speech generation for up to one hour."

  1. Training Sequence Length: Did you actually train on ultra-long sequences to achieve this (and if so, how did you bypass the immense VRAM requirements)? Or was the model trained on shorter chunks (e.g., 10s - 30s) and relies on Qwen3's RoPE/Context extrapolation?
  2. Inference Strategy for Long Speech: Is the 1-hour generation achieved by passing the entire massive text prompt at once, or do you recommend an inference-time chunking strategy (e.g., generating paragraph-by-paragraph and using the trailing audio as the reference for the next chunk)?
  3. Silent Rubbish Loops: Have you encountered the model hallucinating endless silence/noise on very long generations? Do you recommend strict repetition penalties, or forcing EOS via specific text prompt structures?

Thank you for open-sourcing this. Any insights into your training length distribution vs. inference strategy would be massively helpful!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions