Skip to content

feat(providers): add Google Gemini TTS provider#1828

Open
YTLLL wants to merge 3 commits into
moeru-ai:mainfrom
YTLLL:feat/google-gemini-tts-provider-clean
Open

feat(providers): add Google Gemini TTS provider#1828
YTLLL wants to merge 3 commits into
moeru-ai:mainfrom
YTLLL:feat/google-gemini-tts-provider-clean

Conversation

@YTLLL
Copy link
Copy Markdown
Contributor

@YTLLL YTLLL commented May 13, 2026

Description

This PR replaces #1824 with a cleaner branch history.

It adds a Google Gemini API text-to-speech provider to AIRI's speech provider list.

Included:

  • Google Gemini TTS provider metadata
  • Static Gemini TTS model list
  • Static Gemini prebuilt voice list
  • API key/base URL validation
  • Gemini generateContent TTS request handling
  • PCM-to-WAV conversion for playback compatibility
  • Related provider/settings layout fixes discovered during review/testing

The settings layout fixes are included here because they affect the same provider settings surface needed to configure and use the new Gemini TTS provider.

Linked Issues

Replaces #1824

Additional Context

Tested with:

  • pnpm lint
  • pnpm typecheck

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 13, 2026

⏳ Approval required for deploying to Cloudflare Workers (Preview) for stage-web.

Name Link
🔭 Waiting for approval For maintainers, approve here

Hey, maintainers, kindly take some time to review and approve this deployment when you are available. Thank you! 🙏

@github-actions github-actions Bot added apps/stage-web Web App: PWA & Browser feature Related to feature scope/audio-output Scope related to audio output (TTS, Voice cloning, etc.) scope/i18n scope/providers Scope related to providers we support labels May 13, 2026
Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the Google Gemini Speech provider, adding a new settings UI, store registration, and a core implementation that includes a custom fetch adapter to handle Gemini's text-to-speech API and PCM-to-WAV conversion. The changes also include comprehensive unit tests and locale updates, alongside a minor refactor to standardize CSS classes by replacing 'of-x-auto' with 'overflow-x-auto' across several settings modules. Feedback was provided regarding an inconsistency in base URL normalization where the current logic removes trailing slashes, potentially conflicting with the validator's expectations.

Comment thread packages/stage-ui/src/stores/providers/google-gemini-speech.ts
The previous implementation stripped trailing slashes from the base URL,
inconsistent with the baseUrlValidator (which requires trailing slashes)
and all other providers (openai-compatible-builder, openrouter/audio-speech).
Align normalizeBaseUrl, createAudioFetch, and createSpeechProvider with
the project convention.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

apps/stage-web Web App: PWA & Browser feature Related to feature scope/audio-output Scope related to audio output (TTS, Voice cloning, etc.) scope/i18n scope/providers Scope related to providers we support

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant