Author: Suryateja Duvvuri
Listening to a podcast is like listening to a 90 min lecture where the content is only delivered by the host or the professor themselves. This is one way of delivering content but it does not always engage the audience to interact with the content especially in a podcast where audience cannot directly interact with the host. With the power of AI, we can make the podcast interactive which improves audience engagement with the content by asking questions and providing feedback in which the AI can give responses in real time. This application also allows the user to explore a topic in context as much as they want just like in any conversational interaction.
The application offers a simple, intuitive user interface that prompts the user to insert the YouTube podcast link. The backend processes the link and converts it to an audio file that the user can play. Meanwhile, it uses speech to text transcription as well as Ollama to customize the AI to our needs by giving the content of the podcast as if they were the host. Once the audio player shows up, the user can ask questions by clicking the "Start Recording" button. Once they're done asking their question, it will be prompted to the AI where the AI will produce a response and be delivered through text to speech using ElevenLabs API.
Frontend: React.js, Tailwind CSS, Radix UI(for customized components) Backend: Spring Boot, Spring MVC, Spring Web, Spring AI(Ollama LLM), Local Whisper.cpp(For Speech to Text), ElevenLabs API(For text to speech) API: Youtube Data API(For extracting audio from a link), REST API for communicating between frontend and backend
Users can give a Youtube Link which is then converted into an audio file locally.
Users can get AI-generated responses based on the context of the podcast
Speech to Text converts user's voice or podcast's audio to text using Whisper Text To Speech converts AI's textual response into voice using ElevenLabs
Before you can start running this project, make sure you have the following tools installed:
- Java 11 or higher (for backend)
- Node.js/NPM and React.js (for frontend development)
- Python (for diarization and speech-to-text)
First, clone the project repository to your local machine:
git clone https://github.com/SuryatejaDuvvuri/podnexus.git
- Run the following in separate terminal.
- cd PodnexusBackend
- Do
mvn clean installandmvn spring-boot::run
cd podnexus
npm install
npm run start
- Refer to Ollama Documentation on how to install Ollama and run Llama 3.2: https://github.com/ollama/ollama
