-
Notifications
You must be signed in to change notification settings - Fork 269
Open
Labels
questionFurther information is requestedFurther information is requested
Description
Due diligence
- I have done my due diligence in trying to find the answer myself.
Topic
The paper
Question
Summary
I'm using the moshi TTS (Rust server + Python module) and trying to control the speech speed using the padding_bonus parameter in the config.toml. I’ve set:
padding_bonus = -2I expected this to consistently make the TTS speak faster, which sometimes works. But in many cases, the TTS still speaks at normal/default speed as if the parameter is ignored.
My config.toml :
static_dir = "./static/"
log_dir = "/tmp/unmute_logs"
instance_name = "tts"
authorized_ids = ["public_token"]
[modules.tts_py]
type = "Py"
path = "/api/tts_streaming"
text_tokenizer_file = "hf://kyutai/tts-1.6b-en_fr/tokenizer_spm_8k_en_fr_audio.model"
# A higher batch size allows you to serve more users at once, but with a higher latency and memory usage.
batch_size = 4
text_bos_token = 1
[modules.tts_py.py]
log_folder = "/tmp/unmute_logs"
# We could use replace **/*.safetensors with unmute-prod-website/*.safetensors
# to only get the voices used in Unmute, but we are using the TTS for the demo
# on the project page too and for that we want to load the other voices as well
voice_folder = "hf-snapshot://kyutai/tts-voices/**/*.safetensors"
default_voice = "cml-tts/fr/10087_11650_000028-0002.wav"
cfg_is_no_text = true
n_q = 24
padding_bonus = -2
cfg_coef = 2.0
Behavior
- Sometimes: the voice is clearly faster (as expected).
- Other times (even during the same runtime or request type): the voice plays at the normal/default speed.
- I'm using
moshiwith the Rust server and calling the/api/tts_streamingendpoint. - This seems non-deterministic. The parameter
padding_bonusdoes not always apply.
Questions
- Is there any known condition where
padding_bonusis ignored (e.g., short input, voice model fallback, etc.)? - Are there internal voice/model parameters that override it?
- Should I set this parameter differently (e.g., via API call instead of
config.toml)?
Any clarification would be appreciated. I’d be happy to test or debug if needed.
Thanks!
Metadata
Metadata
Assignees
Labels
questionFurther information is requestedFurther information is requested