A full-stack application that lets you record or upload audio, transcribe it via OpenAI’s Whisper model, and generate structured summaries using Google’s Gemini API. Transcriptions and summaries are managed by a FastAPI backend and presented through a modern React frontend.
-
Record Audio
Capture system audio via a loopback device (e.g., “Stereo Mix” on Windows). -
Upload Audio
Upload local audio files (.wav,.mp3, etc.) for transcription. -
Automatic Transcription
Speech-to-text powered by OpenAI Whisper. -
Translation
Non-English audio is auto-translated into English before transcription. -
Session-Scoped History
Only show and summarize transcriptions from your current session. -
Summarization
Generate a concise, structured summary of the transcription using Google’s Gemini API. -
Download Summary
Export your summary as a PDF. -
Search
Full-text search within your current session’s transcripts. -
Modern Frontend
Built with React for a smooth, responsive UI. -
Robust Backend
Powered by FastAPI and SQLite for quick development and easy deployment.
- Backend: FastAPI, SQLite, Uvicorn
- Frontend: React (Create React App)
- Transcription: OpenAI Whisper
- Summarization: Google Gemini API
- Languages: Python 3.8+, Node.js 16+
git clone https://github.com/yourusername/soundcard_testing.git
cd soundcard_testingcd backend
python -m venv venv
# Activate the venv
# Windows
venv\Scripts\activate
# macOS/Linux
source venv/bin/activate
pip install -r requirements.txtCreate a .env file in backend/ with your Gemini API key:
GEMINI_API_KEY=your_gemini_api_key_hereStart the backend server:
uvicorn main:app --reloadThe backend will be available at:
http://127.0.0.1:8000
Open a new terminal and run:
cd frontend
npm install
npm startThe frontend will be available at:
http://localhost:3000
- Start the Backend:
uvicorn main:app --reload
- Start the Frontend:
npm start
- Open the App:
Navigate tohttp://localhost:3000in your browser. - Interact:
- Click Record to capture system audio.
- Or Upload any audio file.
- View live transcription (auto-translated if non-English), then generate a summary.
- Download the summary PDF or search within your session’s transcripts.
- Python 3.8 or higher
- Node.js 16 or higher
- A system audio loopback device (e.g., “Stereo Mix” on Windows)
- A valid Google Gemini API key
- Internet access (to download the Whisper model on first run)
- Session-only Data: Only transcriptions made during the current server run are shown. Restarting or deleting the SQLite file clears history.
- Persistence: The SQLite database file (e.g.,
db.sqlite3) lives inbackend/and persists between runs unless manually deleted. - Loopback Audio: Ensure your OS has a loopback/mix device enabled if you want to record system audio.
This project is licensed under the MIT License. Feel free to use, modify, and distribute!
