An audio-to-text/speech transcription application for podcast authors and listeners, which generates transcripts from audio files and stores them, supports speaker voice tone segmentation, implemented with FastAPI, supports asynchronous processing, and easy deployment.
As a podcast enthusiast, the search process on Xiaoyuzhou does not meet daily needs, unable to search for keywords in audio, and does not provide podcast creators with topics or text manuscripts. Combining the whisper model and LLM to provide high-quality search workflows, achieve high-quality searches and text generation, and later realize voice cloning.
- 🚀 Easy Deployment: Quickly change configurations and deploy through a yaml file.
- 🎙️ Built-in Speaker Voice Tone Segmentation Functionality.
- 📚 Underlying service is FastAPI, all features can be accessed through documentation interfaces, a simple web UI will be provided in the future for user use.
- 🔄 Parallel processing optimization for speaker voice tone segmentation and text processing.
- 🔗 Provides asynchronous interfaces for calling.
- 📃 Audio file caching and extraction of specified time periods from audio files.
python>=3.8
pip install -r requirements.txt
Change the confid/deploy.yaml
file, set ${hf_token}
, see details here, and authorize pyannote/speaker-diarization-3.1
.
Run the command
python main.py
Visit http://0.0.0.0:8080/docs
to get the API documentation information.
Using the endpoint that ends with async/
allows you to submit task tasks, you can check the task status or cancel the task through /status/{task_id}
and /cancel/{task_id}
, and view all historical tasks through /tasks
.
- Web UI page
- Support sending script files as knowledge/history records to the LLM model.
- Support voiceprint library, identify speaker names through voiceprint library and write them into the vector library.
- Support one-click voice cloning with SenseVoice model and CosyVoice model.
- Support Lora fine-tuning for speaking style by selecting characters.
- Support real-time speech recognition.
- Support ONNX model acceleration.