Skip to content

Fine tune anime-voice TTS for local inference #5

@supreme-gg-gg

Description

@supreme-gg-gg

We're looking for alternatives to replace all API calls to ElevenLabs and external voice services with local inference models (if possible). There seem to be quite a few option existing for regular TTS model, but anime voice specific ones are still difficult to find as of the time of first writing.

If you know any pls suggest we're happy to make the entire extension run locally!

*I know that Gemma3n and other multimodal local LLM can already process audio + text to text, and there are many speech to text options, but a reliable TTS locally is far more important

[UPDATE] Unsloth supports fine tuning TTS models which might be really helpful: https://docs.unsloth.ai/basics/text-to-speech-tts-fine-tuning We need some brave contributors to try this out, find a dataset, fine tune, and try running it locally to replace ElevenLabs!!

Metadata

Metadata

Assignees

Labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions