Summary
Add a "silent mode" option to POST /api/short-video that produces the video without TTS narration, returning visuals + captions + optional background music only. Users who want spoken audio record their own voiceover externally and mux it in post.
Proposed API shape (either works):
{
"config": { "voice": "none" }
}
or
{
"config": { "tts": false }
}
When set, the render skips the Kokoro TTS step entirely; timing comes from durationInSeconds per scene (or the sum of paddingBack values).
Motivation
Non-English content is blocked today
Kokoro only ships English voices (28 in af_*, am_*, bf_*, bm_*). Submitting non-English text either produces phonetically-mangled pronunciation or crashes the render process — I hit a pod restart today submitting Brazilian Portuguese text, losing all in-memory video state for that session. The current workarounds are unpleasant:
- Use English TTS and live with robotic/mismatched narration for non-EN audiences (not viable for brand work).
- Render with placeholder English text, then strip audio with
ffmpeg, then mux in a human voiceover — three manual steps per video.
A silent mode would collapse that to one render + one mux (or zero mux if music-only is fine).
Beyond non-English users
Silent mode is also valuable for creators who want:
- Human voice for trust — founder narration, customer testimonial audio, recorded interview clips. TTS can't substitute for a known voice.
- Multi-language distribution — render visuals once, overlay different VO tracks per locale.
- Higher production quality on a budget — self-recorded VO on a decent mic beats Kokoro for brand content, at zero API cost.
- Integration with other TTS providers — users who already pay for ElevenLabs/Azure/Google TTS in another pipeline can feed output into the mux step.
Proposed behavior
- If
config.voice === "none" (or config.tts === false) is set, skip TTS entirely.
- Captions are still rendered on-screen if
config.captionPosition or config.captionStyle is set — they're visual elements, not audio-derived.
- Scene timing:
- If
durationInSeconds is set (global or per-scene), honor it.
- Otherwise, use the sum of
paddingBack values, or a sensible default (e.g., 3s per scene).
- Output: MP4 with video track + optional music track + no voice track.
GET /api/voices should include "none" in the list (or null) for clients enumerating options.
Related
Context
Using the service via https://remotion.abckx.com.br in production for a small Brazilian-Portuguese content holding. Happy to test a PR or provide real-world payload examples if that helps. Thanks for the work on the project — the REST API is really clean once you get past the language limit.
Summary
Add a "silent mode" option to
POST /api/short-videothat produces the video without TTS narration, returning visuals + captions + optional background music only. Users who want spoken audio record their own voiceover externally and mux it in post.Proposed API shape (either works):
{ "config": { "voice": "none" } }or
{ "config": { "tts": false } }When set, the render skips the Kokoro TTS step entirely; timing comes from
durationInSecondsper scene (or the sum ofpaddingBackvalues).Motivation
Non-English content is blocked today
Kokoro only ships English voices (28 in
af_*,am_*,bf_*,bm_*). Submitting non-English text either produces phonetically-mangled pronunciation or crashes the render process — I hit a pod restart today submitting Brazilian Portuguese text, losing all in-memory video state for that session. The current workarounds are unpleasant:ffmpeg, then mux in a human voiceover — three manual steps per video.A silent mode would collapse that to one render + one mux (or zero mux if music-only is fine).
Beyond non-English users
Silent mode is also valuable for creators who want:
Proposed behavior
config.voice === "none"(orconfig.tts === false) is set, skip TTS entirely.config.captionPositionorconfig.captionStyleis set — they're visual elements, not audio-derived.durationInSecondsis set (global or per-scene), honor it.paddingBackvalues, or a sensible default (e.g., 3s per scene).GET /api/voicesshould include"none"in the list (ornull) for clients enumerating options.Related
ttsProviderswitch once the"none"case exists.Context
Using the service via https://remotion.abckx.com.br in production for a small Brazilian-Portuguese content holding. Happy to test a PR or provide real-world payload examples if that helps. Thanks for the work on the project — the REST API is really clean once you get past the language limit.