feat(elevenlabs-say): add ElevenLabs say-style TTS CLI (#328)

andrewgazelka · web-flow · commit 1a36e6fae94e · 2026-05-29T15:57:25.000-07:00
## What

A new uv-packaged Python CLI at `packages/elevenlabs-say/` that mirrors
macOS `say`, backed by the ElevenLabs text-to-speech API. Run it with
`nix run .#elevenlabs-say`.

Behavior:
- Text source precedence: positional `TEXT`, then `--file/-f`, then
stdin (when stdin is not a TTY). A clear error when no text is available
from any source.
- Default action plays audio through the speakers with `ffplay` (from
`ffmpeg`, put on PATH by the Nix wrapper). `--output/-o PATH` saves the
audio instead.
- `--voice NAME|ID` resolves a name to an id via `voices.search`,
otherwise uses the value as a literal id. Default voice is Rachel
(`21m00Tcm4TlvDq8ikWAM`), a premade voice present on every account.
- `--model` defaults to `eleven_flash_v2_5`; `--format` defaults to
`mp3_44100_128`.
- The API key comes from `ELEVENLABS_API_KEY`. If unset, the CLI prints
an actionable error and exits non-zero. No embedded key, no silent
fallback.

## Why Python

ElevenLabs ships an official, typed Python SDK (`elevenlabs`) and has no
official Rust SDK, so a thin CLI over the SDK is the lowest-maintenance
owner. This repo already has first-class uv packaging through
`ix.buildUvApplication` and a worked example
(`examples/python-daily-scraper`), but no standalone TS CLI precedent.
The only runtime dependency is the ElevenLabs SDK; everything else
(`argparse`, `subprocess`, `tempfile`) is stdlib.

## Files

- `package.nix`: discovery metadata (`packageSet`/`flake`).
- `pyproject.toml`: `elevenlabs&gt;=2.50.0,&lt;3.0.0`, `requires-python =
"&gt;=3.13"`, `uv_build` backend, `elevenlabs-say` console script.
- `uv.lock`: committed for a pure Nix build.
- `src/elevenlabs_say/__init__.py`: the CLI, fully type-annotated for
`ty` standard mode.
- `default.nix`: `ix.buildUvApplication` then a `runCommand` +
`makeWrapper` that puts `ffmpeg` on PATH (the `packages/run` pattern),
with a `passthru.tests.printsHelp` smoke test that asserts `--help`
exits 0 and prints usage, with no network and no key.
- `README.md`: task-first setup and usage.

## Example usage

```sh
export ELEVENLABS_API_KEY=sk_...
nix run .#elevenlabs-say -- "the first move sets everything in motion"
echo "hello from index" | nix run .#elevenlabs-say
nix run .#elevenlabs-say -- "save me" --output /tmp/out.mp3
nix run .#elevenlabs-say -- "different voice" --voice Adam
```

## Validation

- `nix build .#elevenlabs-say` succeeds, including the default `ty` type
check ("All checks passed!") with no type-check knobs needed.
- `./result/bin/elevenlabs-say --help` prints usage and exits 0.
- `nix build .#elevenlabs-say.tests.printsHelp` succeeds (offline smoke
test).
- No-key path: `ELEVENLABS_API_KEY` unset prints a clear error and exits
1.
- `nix run .#lint` passes for the new files.
- Live synth was skipped: no ElevenLabs API key exists in Vaultwarden
(`ix-infra`), so there was no key to exercise a real conversion.
diff --git a/packages/elevenlabs-say/README.md b/packages/elevenlabs-say/README.md
@@ -0,0 +1,57 @@
+# elevenlabs-say
+
+A `say`-style command-line tool that speaks text with the [ElevenLabs](https://elevenlabs.io)
+text-to-speech API. It reads text from an argument, a file, or stdin, then plays
+the audio through your speakers or writes it to a file.
+
+## Setup
+
+Set your ElevenLabs API key in the environment. The CLI reads it from
+`ELEVENLABS_API_KEY` and exits with an error if it is unset.
+
+```sh
+export ELEVENLABS_API_KEY=sk_...
+```
+
+## Usage
+
+```sh
+# Speak a string through the speakers.
+nix run .#elevenlabs-say -- "the first move sets everything in motion"
+
+# Speak the contents of a file.
+nix run .#elevenlabs-say -- --file notes.txt
+
+# Speak text piped on stdin.
+echo "hello from index" | nix run .#elevenlabs-say
+
+# Save audio instead of playing it.
+nix run .#elevenlabs-say -- "save me" --output /tmp/out.mp3
+
+# Pick a voice by name or id, and override the model or format.
+nix run .#elevenlabs-say -- "different voice" --voice Adam
+nix run .#elevenlabs-say -- "slower model" --model eleven_multilingual_v2 --format mp3_44100_192
+```
+
+Text source precedence is positional argument, then `--file`, then stdin.
+
+## Defaults
+
+- Voice: Rachel (`21m00Tcm4TlvDq8ikWAM`), a premade voice on every account. A
+  `--voice` value that matches a voice name is resolved to its id; otherwise it
+  is used as a literal id.
+- Model: `eleven_flash_v2_5`, chosen for low latency.
+- Format: `mp3_44100_128`.
+
+## Playback
+
+Playback shells out to `ffplay` from `ffmpeg`, which the Nix wrapper puts on
+PATH. `--output` skips playback and writes the audio bytes directly.
+
+## Known limitations
+
+- Playback needs a working audio device. On a headless host use `--output` to
+  capture the audio instead.
+- A name that collides with a 20-character voice id would resolve as a name
+  first. ElevenLabs voice ids are opaque tokens, so this does not happen in
+  practice.
diff --git a/packages/elevenlabs-say/default.nix b/packages/elevenlabs-say/default.nix
@@ -0,0 +1,78 @@
+{
+  ix,
+  lib,
+  pkgs,
+}:
+
+let
+  fs = lib.fileset;
+  src = fs.toSource {
+    root = ./.;
+    fileset = fs.unions [
+      ./pyproject.toml
+      ./src
+      ./uv.lock
+    ];
+  };
+
+  unwrapped = ix.buildUvApplication pkgs {
+    pname = "elevenlabs-say";
+    version = "0.1.0";
+    inherit src;
+    mainProgram = "elevenlabs-say";
+    # pydantic-core and websockets ship binary wheels that dlopen libstdc++ at
+    # import time on Linux, the same constraint the daily-scraper example handles.
+    runtimeLibraryInputs = [ pkgs.stdenv.cc.cc.lib ];
+    meta = {
+      description = "A say-style ElevenLabs text-to-speech CLI";
+      license = lib.licenses.mit;
+      mainProgram = "elevenlabs-say";
+    };
+  };
+
+  package =
+    pkgs.runCommand "elevenlabs-say"
+      {
+        nativeBuildInputs = [ pkgs.makeWrapper ];
+        strictDeps = true;
+        meta = {
+          description = "A say-style ElevenLabs text-to-speech CLI";
+          license = lib.licenses.mit;
+          mainProgram = "elevenlabs-say";
+        };
+      }
+      ''
+        mkdir -p $out/bin
+        # ffmpeg supplies ffplay, which playback shells out to. afplay is
+        # macOS-only and absent from nixpkgs, so ffplay is the portable choice.
+        makeWrapper ${lib.getExe unwrapped} $out/bin/elevenlabs-say \
+          --prefix PATH : ${lib.makeBinPath [ pkgs.ffmpeg ]}
+      '';
+
+  printsHelp =
+    pkgs.runCommand "elevenlabs-say-prints-help"
+      {
+        nativeBuildInputs = [ package ];
+        strictDeps = true;
+      }
+      ''
+        # No network and no API key: --help must exit 0 and print usage.
+        help=$(elevenlabs-say --help)
+        case "$help" in
+          *"usage: elevenlabs-say"*) ;;
+          *)
+            echo "elevenlabs-say --help did not print usage" >&2
+            printf '%s\n' "$help" >&2
+            exit 1
+            ;;
+        esac
+        mkdir -p "$out"
+      '';
+in
+package.overrideAttrs (old: {
+  passthru = (old.passthru or { }) // {
+    tests = {
+      inherit printsHelp;
+    };
+  };
+})
diff --git a/packages/elevenlabs-say/package.nix b/packages/elevenlabs-say/package.nix
@@ -0,0 +1,5 @@
+{
+  id = "elevenlabs-say";
+  packageSet = true;
+  flake = true;
+}
diff --git a/packages/elevenlabs-say/pyproject.toml b/packages/elevenlabs-say/pyproject.toml
@@ -0,0 +1,15 @@
+[project]
+name = "elevenlabs-say"
+version = "0.1.0"
+description = "A say-style ElevenLabs text-to-speech CLI"
+requires-python = ">=3.13"
+dependencies = [
+    "elevenlabs>=2.50.0,<3.0.0",
+]
+
+[project.scripts]
+elevenlabs-say = "elevenlabs_say:main"
+
+[build-system]
+requires = ["uv_build>=0.11.0,<0.12.0"]
+build-backend = "uv_build"
diff --git a/packages/elevenlabs-say/src/elevenlabs_say/__init__.py b/packages/elevenlabs-say/src/elevenlabs_say/__init__.py
@@ -0,0 +1,237 @@
+"""A say-style ElevenLabs text-to-speech CLI.
+
+Reads text from a positional argument, a file, or stdin, synthesizes speech with
+the ElevenLabs API, and either plays it through the speakers with ``ffplay`` or
+writes the audio to a file. The API key comes from ``ELEVENLABS_API_KEY``; there
+is no embedded key and no silent fallback.
+"""
+
+from __future__ import annotations
+
+import argparse
+import os
+import subprocess
+import sys
+import tempfile
+from dataclasses import dataclass
+from pathlib import Path
+
+from elevenlabs import ElevenLabs
+from elevenlabs.core import ApiError
+
+# Rachel is a stable ElevenLabs premade voice that is available on every account,
+# so it is a safe default for a `say` replacement.
+# https://elevenlabs.io/docs/api-reference/voices/get
+DEFAULT_VOICE_ID = "21m00Tcm4TlvDq8ikWAM"
+DEFAULT_MODEL_ID = "eleven_flash_v2_5"
+DEFAULT_OUTPUT_FORMAT = "mp3_44100_128"
+
+API_KEY_ENV = "ELEVENLABS_API_KEY"
+
+
+class SayError(Exception):
+    """An operator-facing failure with an actionable message."""
+
+
+@dataclass(frozen=True)
+class CliArgs:
+    text: str | None
+    file: Path | None
+    output: Path | None
+    voice: str
+    model: str
+    output_format: str
+
+
+def parse_args(argv: list[str] | None = None) -> CliArgs:
+    parser = argparse.ArgumentParser(
+        prog="elevenlabs-say",
+        description="Synthesize speech with ElevenLabs and play it or save it to a file.",
+    )
+    _ = parser.add_argument(
+        "text",
+        nargs="?",
+        default=None,
+        help="Text to speak. Omit to read from --file or stdin.",
+    )
+    _ = parser.add_argument(
+        "-f",
+        "--file",
+        type=Path,
+        default=None,
+        help="Read text from this file instead of the positional argument.",
+    )
+    _ = parser.add_argument(
+        "-o",
+        "--output",
+        type=Path,
+        default=None,
+        help="Write audio to this file instead of playing it.",
+    )
+    _ = parser.add_argument(
+        "--voice",
+        default=DEFAULT_VOICE_ID,
+        help=(
+            "Voice name or id. A value that matches a voice name is resolved to "
+            f"its id; otherwise it is used verbatim. Defaults to Rachel ({DEFAULT_VOICE_ID})."
+        ),
+    )
+    _ = parser.add_argument(
+        "--model",
+        default=DEFAULT_MODEL_ID,
+        help=f"Model id. Defaults to {DEFAULT_MODEL_ID}.",
+    )
+    _ = parser.add_argument(
+        "--format",
+        dest="output_format",
+        default=DEFAULT_OUTPUT_FORMAT,
+        help=f"Output audio format. Defaults to {DEFAULT_OUTPUT_FORMAT}.",
+    )
+    namespace = parser.parse_args(argv)
+
+    text: str | None = namespace.text
+    file: Path | None = namespace.file
+    output: Path | None = namespace.output
+    voice: str = namespace.voice
+    model: str = namespace.model
+    output_format: str = namespace.output_format
+
+    return CliArgs(
+        text=text,
+        file=file,
+        output=output,
+        voice=voice,
+        model=model,
+        output_format=output_format,
+    )
+
+
+def read_text(args: CliArgs) -> str:
+    """Resolve the text to speak: positional arg, then --file, then stdin."""
+    if args.text is not None:
+        source = args.text
+    elif args.file is not None:
+        try:
+            source = args.file.read_text(encoding="utf-8")
+        except OSError as exc:
+            raise SayError(f"cannot read text file {args.file}: {exc}") from exc
+    elif not sys.stdin.isatty():
+        source = sys.stdin.read()
+    else:
+        raise SayError(
+            "no text to speak: pass TEXT, use --file PATH, or pipe text on stdin"
+        )
+
+    text = source.strip()
+    if not text:
+        raise SayError("no text to speak: the resolved text is empty")
+    return text
+
+
+def make_client() -> ElevenLabs:
+    if not os.environ.get(API_KEY_ENV):
+        raise SayError(
+            f"{API_KEY_ENV} is not set; export your ElevenLabs API key, "
+            f"for example: export {API_KEY_ENV}=sk_..."
+        )
+    return ElevenLabs()
+
+
+def resolve_voice_id(client: ElevenLabs, voice: str) -> str:
+    """Treat ``voice`` as a name first; fall back to using it as an id verbatim.
+
+    ElevenLabs voice ids are opaque 20-character tokens, so a human-typed name
+    almost never collides with an id. Searching by name keeps the CLI usable with
+    friendly voice names while still accepting a raw id.
+    """
+    try:
+        response = client.voices.search(search=voice)
+    except ApiError as exc:
+        raise SayError(format_api_error("resolve voice", exc)) from exc
+
+    for candidate in response.voices:
+        if candidate.name is not None and candidate.name.casefold() == voice.casefold():
+            return candidate.voice_id
+
+    # No name match: use the supplied value as a literal voice id.
+    return voice
+
+
+def synthesize(client: ElevenLabs, text: str, args: CliArgs, voice_id: str) -> bytes:
+    try:
+        chunks = client.text_to_speech.convert(
+            voice_id=voice_id,
+            text=text,
+            model_id=args.model,
+            output_format=args.output_format,
+        )
+        return b"".join(chunks)
+    except ApiError as exc:
+        raise SayError(format_api_error("synthesize speech", exc)) from exc
+
+
+def format_api_error(action: str, exc: ApiError) -> str:
+    if exc.status_code is not None:
+        return f"failed to {action}: ElevenLabs API returned status {exc.status_code}: {exc.body}"
+    return f"failed to {action}: {exc.body}"
+
+
+def write_output(audio: bytes, output: Path) -> None:
+    try:
+        _ = output.write_bytes(audio)
+    except OSError as exc:
+        raise SayError(f"cannot write audio to {output}: {exc}") from exc
+
+
+def play(audio: bytes) -> None:
+    """Play MP3 bytes through the speakers with ``ffplay``.
+
+    ``ffplay`` is provided by ``ffmpeg``, which the Nix wrapper puts on PATH. It
+    is the cross-platform, Nix-pinnable counterpart to macOS ``afplay``.
+    """
+    with tempfile.NamedTemporaryFile(suffix=".mp3", delete=False) as handle:
+        temp_path = Path(handle.name)
+        _ = handle.write(audio)
+    try:
+        completed = subprocess.run(
+            [
+                "ffplay",
+                "-nodisp",
+                "-autoexit",
+                "-loglevel",
+                "error",
+                str(temp_path),
+            ],
+            check=False,
+        )
+        if completed.returncode != 0:
+            raise SayError(f"ffplay exited with status {completed.returncode}")
+    except FileNotFoundError as exc:
+        raise SayError(
+            "ffplay was not found on PATH; install ffmpeg to play audio, "
+            "or use --output PATH to save the audio instead"
+        ) from exc
+    finally:
+        temp_path.unlink(missing_ok=True)
+
+
+def run(args: CliArgs) -> None:
+    text = read_text(args)
+    client = make_client()
+    voice_id = resolve_voice_id(client, args.voice)
+    audio = synthesize(client, text, args, voice_id)
+
+    if args.output is not None:
+        write_output(audio, args.output)
+        print(f"wrote {args.output}", file=sys.stderr)
+    else:
+        play(audio)
+
+
+def main() -> None:
+    args = parse_args()
+    try:
+        run(args)
+    except SayError as exc:
+        print(f"elevenlabs-say: {exc}", file=sys.stderr)
+        raise SystemExit(1) from exc
diff --git a/packages/elevenlabs-say/uv.lock b/packages/elevenlabs-say/uv.lock