Skip to content

tts --stream --with-timestamps: route to /stream/with-timestamps instead of hard-erroring #8

@longevityboris

Description

@longevityboris

Context

src/commands/tts.rs rejects --stream + --with-timestamps as incompatible, but the OpenAPI spec has a dedicated endpoint for exactly this combination: POST /v1/text-to-speech/{voice_id}/stream/with-timestamps (operationId text_to_speech_stream_with_timestamps).

The dialogue command already handles the streaming + timestamps combination via NDJSON; TTS should too.

Use case

Low-latency karaoke / live subtitle generation. Users currently have to choose between latency (stream) OR per-char alignment (with-timestamps) but not both.

What to do

  • When both flags are set, route to /v1/text-to-speech/{voice_id}/stream/with-timestamps.
  • Parse the NDJSON response (one chunk per audio frame, each with alignment), writing audio bytes to --output and alignment to --save-timestamps.
  • Update TTS_HELP + agent-info to reflect that the combo now works.

Files

  • src/commands/tts.rs
  • src/help.rs (TTS_HELP)
  • src/commands/agent_info.rs

Acceptance

  • tts "hello" --stream --with-timestamps -o out.mp3 --save-timestamps t.jsonl writes both files.
  • Integration test mocks the NDJSON endpoint.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions