Implement OpenAI Text-to-Speech Conversion Model #126

caseymanos · 2025-11-25T23:06:17Z

First OSS contribution! Had Opus 4.5 help a bit, tried to follow all the contribution guidelines and code checks I could find.

Resolves #90

This implements text-to-speech conversion support for OpenAI, enabling the
SDK to convert text input to audio output using OpenAI's TTS API.

Changes

Add AbstractOpenAiCompatibleTextToSpeechConversionModel base class for
OpenAI-compatible TTS providers
Add OpenAiTextToSpeechConversionModel concrete implementation for OpenAI
Update OpenAiProvider::createModel() to instantiate TTS models
Added unit tests

Supported Features

Models: tts-1, tts-1-hd, gpt-4o-mini-tts
Voices: alloy, ash, ballad, coral, echo, fable, onyx, nova, sage,
shimmer, verse
Output formats: mp3, opus, aac, flac, wav, pcm
Custom options: speed (0.25-4.0), instructions (gpt-4o-mini-tts only)

Implementation Notes

The abstract base class follows the same pattern as
AbstractOpenAiCompatibleImageGenerationModel, making it reusable for other
providers that implement OpenAI-compatible TTS endpoints.

Required API parameters such as
voice must be explicitly configured, and the API returns clear validation
errors if omitted. This keeps the abstract class clean for other
OpenAI-compatible providers that may have different voice options.

The TTS API returns binary audio
data directly. The implementation handles this by base64-encoding the
response and wrapping it in a File object within the standard
GenerativeAiResult structure. Would like more review on this approach in
particular.

github-actions · 2025-11-25T23:06:28Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

If you're merging code through a pull request on GitHub, copy and paste the following into the bottom of the merge commit message.

Co-authored-by: caseymanos <[email protected]>
Co-authored-by: rollybueno <[email protected]>

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Move voice parameter handling to follow codebase philosophy: SDK validates structure/format, API validates business rules. The abstract class no longer sets a default voice, allowing the OpenAI API to return a clear error if voice is not configured. This makes the TTS implementation consistent with text generation and image generation models.

caseymanos added 3 commits November 25, 2025 16:24

OpenAI TTS model implementation

be8dae2

tests

0023e85

class implementation

460e6ae

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement OpenAI Text-to-Speech Conversion Model #126

Implement OpenAI Text-to-Speech Conversion Model #126

Uh oh!

caseymanos commented Nov 25, 2025 •

edited

Loading

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Implement OpenAI Text-to-Speech Conversion Model #126

Are you sure you want to change the base?

Implement OpenAI Text-to-Speech Conversion Model #126

Uh oh!

Conversation

caseymanos commented Nov 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Supported Features

Implementation Notes

Uh oh!

github-actions bot commented Nov 25, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

caseymanos commented Nov 25, 2025 •

edited

Loading