Context
src/commands/tts.rs rejects --stream + --with-timestamps as incompatible, but the OpenAPI spec has a dedicated endpoint for exactly this combination: POST /v1/text-to-speech/{voice_id}/stream/with-timestamps (operationId text_to_speech_stream_with_timestamps).
The dialogue command already handles the streaming + timestamps combination via NDJSON; TTS should too.
Use case
Low-latency karaoke / live subtitle generation. Users currently have to choose between latency (stream) OR per-char alignment (with-timestamps) but not both.
What to do
- When both flags are set, route to
/v1/text-to-speech/{voice_id}/stream/with-timestamps.
- Parse the NDJSON response (one chunk per audio frame, each with alignment), writing audio bytes to
--output and alignment to --save-timestamps.
- Update TTS_HELP + agent-info to reflect that the combo now works.
Files
src/commands/tts.rs
src/help.rs (TTS_HELP)
src/commands/agent_info.rs
Acceptance
tts "hello" --stream --with-timestamps -o out.mp3 --save-timestamps t.jsonl writes both files.
- Integration test mocks the NDJSON endpoint.
Context
src/commands/tts.rsrejects--stream + --with-timestampsas incompatible, but the OpenAPI spec has a dedicated endpoint for exactly this combination:POST /v1/text-to-speech/{voice_id}/stream/with-timestamps(operationIdtext_to_speech_stream_with_timestamps).The dialogue command already handles the streaming + timestamps combination via NDJSON; TTS should too.
Use case
Low-latency karaoke / live subtitle generation. Users currently have to choose between latency (stream) OR per-char alignment (with-timestamps) but not both.
What to do
/v1/text-to-speech/{voice_id}/stream/with-timestamps.--outputand alignment to--save-timestamps.Files
src/commands/tts.rssrc/help.rs(TTS_HELP)src/commands/agent_info.rsAcceptance
tts "hello" --stream --with-timestamps -o out.mp3 --save-timestamps t.jsonlwrites both files.