Feature request: silent mode / no-TTS option (bring-your-own voiceover)

## Summary

Add a "silent mode" option to `POST /api/short-video` that produces the video **without TTS narration**, returning visuals + captions + optional background music only. Users who want spoken audio record their own voiceover externally and mux it in post.

Proposed API shape (either works):

```json
{
  "config": { "voice": "none" }
}
```

or

```json
{
  "config": { "tts": false }
}
```

When set, the render skips the Kokoro TTS step entirely; timing comes from `durationInSeconds` per scene (or the sum of `paddingBack` values).

## Motivation

### Non-English content is blocked today

Kokoro only ships English voices (28 in `af_*`, `am_*`, `bf_*`, `bm_*`). Submitting non-English text either produces phonetically-mangled pronunciation or crashes the render process — I hit a pod restart today submitting Brazilian Portuguese text, losing all in-memory video state for that session. The current workarounds are unpleasant:

1. Use English TTS and live with robotic/mismatched narration for non-EN audiences (not viable for brand work).
2. Render with placeholder English text, then strip audio with `ffmpeg`, then mux in a human voiceover — three manual steps per video.

A silent mode would collapse that to one render + one mux (or zero mux if music-only is fine).

### Beyond non-English users

Silent mode is also valuable for creators who want:

- **Human voice for trust** — founder narration, customer testimonial audio, recorded interview clips. TTS can't substitute for a known voice.
- **Multi-language distribution** — render visuals once, overlay different VO tracks per locale.
- **Higher production quality on a budget** — self-recorded VO on a decent mic beats Kokoro for brand content, at zero API cost.
- **Integration with other TTS providers** — users who already pay for ElevenLabs/Azure/Google TTS in another pipeline can feed output into the mux step.

## Proposed behavior

- If `config.voice === "none"` (or `config.tts === false`) is set, skip TTS entirely.
- Captions are still rendered on-screen if `config.captionPosition` or `config.captionStyle` is set — they're visual elements, not audio-derived.
- Scene timing:
  - If `durationInSeconds` is set (global or per-scene), honor it.
  - Otherwise, use the sum of `paddingBack` values, or a sensible default (e.g., 3s per scene).
- Output: MP4 with video track + optional music track + no voice track.
- `GET /api/voices` should include `"none"` in the list (or `null`) for clients enumerating options.

## Related

- #42 — voice-only (TTS MP3) is the opposite shape of this request; the two are complementary.
- #24 (closed) — disabling music was a similar "turn off one stream" ask; the pattern is already in the codebase.
- #46 — alternative TTS providers (Gemini) would fit naturally under the same `ttsProvider` switch once the `"none"` case exists.

## Context

Using the service via https://remotion.abckx.com.br in production for a small Brazilian-Portuguese content holding. Happy to test a PR or provide real-world payload examples if that helps. Thanks for the work on the project — the REST API is really clean once you get past the language limit.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature request: silent mode / no-TTS option (bring-your-own voiceover) #74

Summary

Motivation

Non-English content is blocked today

Beyond non-English users

Proposed behavior

Related

Context

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Feature request: silent mode / no-TTS option (bring-your-own voiceover) #74

Description

Summary

Motivation

Non-English content is blocked today

Beyond non-English users

Proposed behavior

Related

Context

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions