Implement OpenAI Text-to-Speech Conversion Model #126
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
First OSS contribution! Had Opus 4.5 help a bit, tried to follow all the contribution guidelines and code checks I could find.
Resolves #90
This implements text-to-speech conversion support for OpenAI, enabling the
SDK to convert text input to audio output using OpenAI's TTS API.
Changes
OpenAI-compatible TTS providers
Supported Features
shimmer, verse
Implementation Notes
The abstract base class follows the same pattern as
AbstractOpenAiCompatibleImageGenerationModel, making it reusable for other
providers that implement OpenAI-compatible TTS endpoints.
Required API parameters such as
voice must be explicitly configured, and the API returns clear validation
errors if omitted. This keeps the abstract class clean for other
OpenAI-compatible providers that may have different voice options.
The TTS API returns binary audio
data directly. The implementation handles this by base64-encoding the
response and wrapping it in a File object within the standard
GenerativeAiResult structure. Would like more review on this approach in
particular.