Fine tune anime-voice TTS for local inference

We're looking for alternatives to replace all API calls to ElevenLabs and external voice services with local inference models (if possible). There seem to be quite a few option existing for regular TTS model, but anime voice specific ones are still difficult to find as of the time of first writing.

> If you know any pls suggest we're happy to make the entire extension run locally!

*I know that Gemma3n and other multimodal local LLM can already process audio + text to text, and there are many speech to text options, but a reliable TTS locally is far more important

[UPDATE] Unsloth supports fine tuning TTS models which might be really helpful: https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning We need some brave contributors to try this out, find a dataset, fine tune, and try running it locally to replace ElevenLabs!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fine tune anime-voice TTS for local inference #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Fine tune anime-voice TTS for local inference #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions