Inconsistent padding_bonus effect: TTS sometimes speaks fast, sometimes at default speed

### Due diligence

- [x] I have done my due diligence in trying to find the answer myself.

### Topic

The paper

### Question

### Summary

I'm using the `moshi` TTS (Rust server + Python module) and trying to control the speech speed using the `padding_bonus` parameter in the `config.toml`. I’ve set:

```toml
padding_bonus = -2
````

I expected this to consistently make the TTS speak faster, which sometimes works. But in many cases, the TTS still speaks at normal/default speed as if the parameter is ignored.

---

### My config.toml :

```toml
static_dir = "./static/"
log_dir = "/tmp/unmute_logs"
instance_name = "tts"
authorized_ids = ["public_token"]

[modules.tts_py]
type = "Py"
path = "/api/tts_streaming"
text_tokenizer_file = "hf://kyutai/tts-1.6b-en_fr/tokenizer_spm_8k_en_fr_audio.model"
# A higher batch size allows you to serve more users at once, but with a higher latency and memory usage.
batch_size = 4
text_bos_token = 1

[modules.tts_py.py]
log_folder = "/tmp/unmute_logs"
# We could use replace **/*.safetensors with unmute-prod-website/*.safetensors
# to only get the voices used in Unmute, but we are using the TTS for the demo
# on the project page too and for that we want to load the other voices as well
voice_folder = "hf-snapshot://kyutai/tts-voices/**/*.safetensors"
default_voice = "cml-tts/fr/10087_11650_000028-0002.wav"
cfg_is_no_text = true
n_q = 24
padding_bonus = -2
cfg_coef = 2.0

```

---

### Behavior

* **Sometimes**: the voice is clearly faster (as expected).
* **Other times** (even during the same runtime or request type): the voice plays at the normal/default speed.
* I'm using `moshi` with the Rust server and calling the `/api/tts_streaming` endpoint.
* This seems non-deterministic. The parameter `padding_bonus` does not always apply.

---

### Questions

1. Is there any known condition where `padding_bonus` is ignored (e.g., short input, voice model fallback, etc.)?
2. Are there internal voice/model parameters that override it?
3. Should I set this parameter differently (e.g., via API call instead of `config.toml`)?

---

Any clarification would be appreciated. I’d be happy to test or debug if needed.
Thanks!


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent padding_bonus effect: TTS sometimes speaks fast, sometimes at default speed #150

Due diligence

Topic

Question

Summary

My config.toml :

Behavior

Questions

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent padding_bonus effect: TTS sometimes speaks fast, sometimes at default speed #150

Description

Due diligence

Topic

Question

Summary

My config.toml :

Behavior

Questions

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions