Skip to content

Inconsistent padding_bonus effect: TTS sometimes speaks fast, sometimes at default speed #150

@Isam-tfares

Description

@Isam-tfares

Due diligence

  • I have done my due diligence in trying to find the answer myself.

Topic

The paper

Question

Summary

I'm using the moshi TTS (Rust server + Python module) and trying to control the speech speed using the padding_bonus parameter in the config.toml. I’ve set:

padding_bonus = -2

I expected this to consistently make the TTS speak faster, which sometimes works. But in many cases, the TTS still speaks at normal/default speed as if the parameter is ignored.


My config.toml :

static_dir = "./static/"
log_dir = "/tmp/unmute_logs"
instance_name = "tts"
authorized_ids = ["public_token"]

[modules.tts_py]
type = "Py"
path = "/api/tts_streaming"
text_tokenizer_file = "hf://kyutai/tts-1.6b-en_fr/tokenizer_spm_8k_en_fr_audio.model"
# A higher batch size allows you to serve more users at once, but with a higher latency and memory usage.
batch_size = 4
text_bos_token = 1

[modules.tts_py.py]
log_folder = "/tmp/unmute_logs"
# We could use replace **/*.safetensors with unmute-prod-website/*.safetensors
# to only get the voices used in Unmute, but we are using the TTS for the demo
# on the project page too and for that we want to load the other voices as well
voice_folder = "hf-snapshot://kyutai/tts-voices/**/*.safetensors"
default_voice = "cml-tts/fr/10087_11650_000028-0002.wav"
cfg_is_no_text = true
n_q = 24
padding_bonus = -2
cfg_coef = 2.0

Behavior

  • Sometimes: the voice is clearly faster (as expected).
  • Other times (even during the same runtime or request type): the voice plays at the normal/default speed.
  • I'm using moshi with the Rust server and calling the /api/tts_streaming endpoint.
  • This seems non-deterministic. The parameter padding_bonus does not always apply.

Questions

  1. Is there any known condition where padding_bonus is ignored (e.g., short input, voice model fallback, etc.)?
  2. Are there internal voice/model parameters that override it?
  3. Should I set this parameter differently (e.g., via API call instead of config.toml)?

Any clarification would be appreciated. I’d be happy to test or debug if needed.
Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions