GitHub - haizelabs/spoken: a single interface around speech-to-speech foundation models

.... .- .. --.. . .-.. .- -... ... ✨🎤 pip install spoken 🎤✨ .. - .----. ... / .- / -... .- -.. / -.. .- -.--

spoken provides a single abstraction for a variety of audio foundation models. It is primarily designed for large-scale evaluation/benchmarking of realtime speech-to-speech models, but it can also be used as a drop-in inference library.

# os.environ['LOG_LEVEL'] = 'DEBUG' # detailed client/server state management logging
import spoken

model = spoken("gpt-4o-realtime-preview-2024-12-17", "examples/scooby.wav")
input_asr, output_asr, output_audio = await model.run()

output_asr                   # "That's quite the story..."
len(output_audio)            # 8549ms
model.output_audio_tokens    # 254

Large audio models operate on audio tokens rather than transcribed text. This enables low-latency streaming conversational audio agents that directly generate audio end-to-end. Although promising and exciting, using these models requires non-trivial configuration and state management, due to major providers differing significantly in interface.

(AFAWK,) spoken supports all provider speech-to-speech models.

OpenAI Realtime
- gpt-4o-realtime-preview-2024-12-17
- gpt-4o-mini-audio-preview-2024-12-17 [coming soon, not part of realtime API]
Gemini Multimodal Live
- gemini-2.5-flash-preview-native-audio-dialog
- gemini-2.5-flash-exp-native-audio-thinking-dialog
Amazon Nova Sonic (pip install spoken[nova])
- amazon.nova-sonic-v1:0

Examples

Benchmarking TTFT (Time-To-First-Token) Latency
OpenAI System Prompt
more interesting things coming soon...

Installation

Simply run pip install spoken
- Python 3.12+ required + pip install spoken[nova] + portaudio.h (+ OS X: brew install portaudio) for Amazon Nova Sonic support

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
examples		examples
spoken		spoken
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Examples

Installation

About

Uh oh!

Releases 1

Packages

Languages

License

haizelabs/spoken

Folders and files

Latest commit

History

Repository files navigation

Examples

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages