PodcaSphere

An audio-to-text/speech transcription application for podcast authors and listeners, which generates transcripts from audio files and stores them, supports speaker voice tone segmentation, implemented with FastAPI, supports asynchronous processing, and easy deployment.

Motivation

As a podcast enthusiast, the search process on Xiaoyuzhou does not meet daily needs, unable to search for keywords in audio, and does not provide podcast creators with topics or text manuscripts. Combining the whisper model and LLM to provide high-quality search workflows, achieve high-quality searches and text generation, and later realize voice cloning.

Main Features

🚀 Easy Deployment: Quickly change configurations and deploy through a yaml file.
🎙️ Built-in Speaker Voice Tone Segmentation Functionality.
📚 Underlying service is FastAPI, all features can be accessed through documentation interfaces, a simple web UI will be provided in the future for user use.
🔄 Parallel processing optimization for speaker voice tone segmentation and text processing.
🔗 Provides asynchronous interfaces for calling.
📃 Audio file caching and extraction of specified time periods from audio files.

Usage Plan

Environment Setup

python>=3.8
pip install -r requirements.txt

Running the Service

Change the confid/deploy.yaml file, set ${hf_token}, see details here, and authorize pyannote/speaker-diarization-3.1.

Run the command

python main.py

API Documentation

Visit http://0.0.0.0:8080/docs to get the API documentation information.

Using the endpoint that ends with async/ allows you to submit task tasks, you can check the task status or cancel the task through /status/{task_id} and /cancel/{task_id}, and view all historical tasks through /tasks.

To Do

Web UI page
Support sending script files as knowledge/history records to the LLM model.
Support voiceprint library, identify speaker names through voiceprint library and write them into the vector library.
Support one-click voice cloning with SenseVoice model and CosyVoice model.
Support Lora fine-tuning for speaking style by selecting characters.
Support real-time speech recognition.
Support ONNX model acceleration.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
app		app
configs		configs
tmp		tmp
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
README_CH.md		README_CH.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PodcaSphere

Motivation

Main Features

Usage Plan

Environment Setup

Running the Service

API Documentation

To Do

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

blacker521/PodcaSphere

Folders and files

Latest commit

History

Repository files navigation

PodcaSphere

Motivation

Main Features

Usage Plan

Environment Setup

Running the Service

API Documentation

To Do

Acknowledgements

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages