Skip to content

Conversation

@caseymanos
Copy link

@caseymanos caseymanos commented Nov 25, 2025

First OSS contribution! Had Opus 4.5 help a bit, tried to follow all the contribution guidelines and code checks I could find.

Resolves #90

This implements text-to-speech conversion support for OpenAI, enabling the
SDK to convert text input to audio output using OpenAI's TTS API.

Changes

  • Add AbstractOpenAiCompatibleTextToSpeechConversionModel base class for
    OpenAI-compatible TTS providers
  • Add OpenAiTextToSpeechConversionModel concrete implementation for OpenAI
  • Update OpenAiProvider::createModel() to instantiate TTS models
  • Added unit tests

Supported Features

  • Models: tts-1, tts-1-hd, gpt-4o-mini-tts
  • Voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage,
    shimmer, verse
  • Output formats: mp3, opus, aac, flac, wav, pcm
  • Custom options: speed (0.25-4.0), instructions (gpt-4o-mini-tts only)

Implementation Notes

The abstract base class follows the same pattern as
AbstractOpenAiCompatibleImageGenerationModel, making it reusable for other
providers that implement OpenAI-compatible TTS endpoints.

Required API parameters such as
voice must be explicitly configured, and the API returns clear validation
errors if omitted. This keeps the abstract class clean for other
OpenAI-compatible providers that may have different voice options.

The TTS API returns binary audio
data directly. The implementation handles this by base64-encoding the
response and wrapping it in a File object within the standard
GenerativeAiResult structure. Would like more review on this approach in
particular.

@github-actions
Copy link

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: caseymanos <[email protected]>
Co-authored-by: rollybueno <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Move voice parameter handling to follow codebase philosophy: SDK validates
structure/format, API validates business rules. The abstract class no longer
sets a default voice, allowing the OpenAI API to return a clear error if
voice is not configured. This makes the TTS implementation consistent with
text generation and image generation models.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Implement OpenAI Text-to-Speech Conversion Model

1 participant