A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
-
Updated
Feb 18, 2026 - Python
A comprehensive ComfyUI integration for Microsoft's VibeVoice text-to-speech model, enabling high-quality single and multi-speaker voice synthesis directly within your ComfyUI workflows.
A ComfyUI custom node integration for multi-engine multi-language Text-to-Speech and Voice Conversion. Supports: RVC, Echo-TTS, Qwen3-TTS, Cozy Voice 3, Step Audio EditX, IndexTTS-2, Chatterbox (classic and multilingual 23-lang), F5-TTS, Higgs Audio 2 and VibeVoice with unlimited text length, SRT timing, Character support, and many audio tools
Soundstorm is a cutting-edge AI-powered audio manipulation application designed to provide a rich yet simplified experience for sound designers, algorithmic composers, and experimental audio enthusiasts. From sample pack creation and algorithmic composition to AI text-to-audio and onscreen ChatGPT, Soundstorm is a sonic powerhouse.
Real-Time Deepfake Pipeline
Local-first CLI that turns Markdown scripts into multi-speaker podcast-style audio using Coqui XTTS v2.
Music Generation Using Deep Learning🎶🎵
AI Voice Agents: Exploring the Next Generation of Human-Machine Interaction! 🎙️🤖🎧
AudioInsight is a web application that processes audio, generates transcriptions, and allows users to ask questions about the related audio.
A local-first EPUB reader with high-fidelity neural text-to-speech, word-level synchronization, and Next.js/FastAPI/ONNX stack.
An approach to Andrej Karpathy's LLM challenge, as outlined here: https://twitter.com/karpathy/status/1760740503614836917
Community list of AI tools for audio and music
Professional Yocto BSP Layer for Dynamic Devices Edge Computing Platforms - AI Audio Processing, E-Ink Displays, Power Management, Wireless Connectivity, i.MX8MM/i.MX93 Support
Maya Voice AI is an open-source project that demonstrates the Maya1 model, capable of generating realistic voice audio from text input with rich emotional and descriptive control. This repository provides a demo for text-to-speech synthesis using advanced language models and the SNAC codec, focusing on high-quality audio at 24kHz.
AI Audio Framework 🎵
IntelliMix is an AI-powered web app for transforming and editing audio with ease. Create mashups just with one prompt, trim audio, batch process, and download media—all in one streamlined interface. Built with React, Flask, and integrated AI tools.
High-performance KittenTTS API server with a built-in web UI, OpenAI-compatible routes, long-form text support, and optional CUDA acceleration.
Official Unity SDK for VARCO Voice API. High-quality AI text-to-speech, real-time LipSync, and 80+ professional DSP presets for game characters.
Acoustic Space Analyzer AI Pro is a professional acoustic analysis tool that leverages artificial intelligence to generate optimized DSP processing chains for any acoustic environment. This innovative application combines real-time spectral analysis, 3D spatial scanning, and AI-powered audio processing to deliver precise acoustic corrections.
Kokoro analyzes real-world audio to uncover hidden patterns: sentiment shifts, controversial topics, consensus levels, and action items that emerge during discussions.
ComfyUI custom nodes for the Dia2 TTS model — generate speech, timestamps, and captions directly inside ComfyUI.
Add a description, image, and links to the ai-audio topic page so that developers can more easily learn about it.
To associate your repository with the ai-audio topic, visit your repo's landing page and select "manage topics."