Skip to content
View pcherkashin's full-sized avatar
:electron:
Crafting AI powered future
:electron:
Crafting AI powered future

Highlights

  • Pro

Block or report pcherkashin

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Stars

Voice AI

15 repositories

SOTA Open Source TTS

Python 19,488 1,509 Updated Feb 18, 2025

Instant voice cloning by MIT and MyShell. Audio foundation model.

Python 31,059 3,123 Updated Jan 7, 2025

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"

Python 9,862 1,331 Updated Feb 24, 2025

mahilo: Multi-Agent Human-in-the-Loop Framework is a flexible framework for creating multi-agent systems that can each interact with humans while sharing relevant context internally.

Python 166 12 Updated Feb 9, 2025

Local realtime voice AI

Python 2,233 123 Updated Feb 25, 2025

AI-powered dictation tool

TypeScript 350 19 Updated Nov 22, 2024

An open source implementation of Microsoft's VALL-E X zero-shot TTS model. Demo is available in https://plachtaa.github.io/vallex/

Python 7,803 774 Updated Feb 11, 2024

AI Meeting Minutes analysis App built with NextJS, Langflow, Groq, and OpenAI

TypeScript 417 75 Updated Dec 25, 2024

🐸💬 - a deep learning toolkit for Text-to-Speech, battle-tested in research and production

Python 37,962 4,738 Updated Aug 16, 2024

Gradio WebUI for audio processing, powered by Whisper (OpenAI-Whisper, Faster-Whisper, Whisper-Timestamped). Features Voice Changer(RVC), zero-shot Voice Cloning (E2, F5-TTS, CosyVoice), YouTube do…

Python 3,352 250 Updated Feb 24, 2025

LLaSA: Scaling Train-time and Inference-time Compute for LLaMA-based Speech Synthesis

Python 352 26 Updated Feb 14, 2025

Gemini Multimodal Live + WebRTC in a single `app.ts`

Python 183 23 Updated Dec 22, 2024

YuE: Open Full-song Music Generation Foundation Model, something similar to Suno.ai but open

Python 3,995 427 Updated Feb 20, 2025

YuE: Open Full-song Generation Foundation for the GPU Poor

Python 296 26 Updated Feb 14, 2025