A modern web application that transcribes audio from YouTube videos and local files into text using AI. Built with TypeScript (Next.js) for the frontend and Python (FastAPI) for the backend.
- 🎯 YouTube video transcription
- 📁 Local audio/video file upload
- 🤖 AI-powered summary generation (3 key bullet points)
- 📋 Copy to clipboard functionality
- 💾 Save transcriptions as text files
- 🌍 Multi-language support (10+ languages with auto-detection)
- 🎨 Modern dark theme UI
- 🔄 Real-time transcription status updates
- 🎵 Supports multiple audio/video formats
- 🔒 Secure and private processing
- Next.js 14 (React)
- TypeScript
- Tailwind CSS
- Lucide Icons
- Python 3.11+
- FastAPI
- Whisper AI (for transcription)
- yt-dlp (for YouTube downloads)
- FFmpeg (for audio processing)
- Node.js 18+ and npm
- Python 3.11+
- FFmpeg (included in the backend/app directory for Windows; install via brew/apt on Mac/Linux)
- OpenAI API key (optional, required only for AI summary generation feature)
- Create and activate a virtual environment:
cd backend
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate- Install dependencies:
pip install -r requirements.txt- (Optional) Configure OpenAI API key for summary feature:
# Create a .env file in the backend directory
echo "OPENAI_API_KEY=your_api_key_here" > .env- Start the backend server:
cd app
python main.pyThe backend will be available at http://localhost:8000
- Install dependencies:
cd frontend
npm install- Start the development server:
npm run devThe frontend will be available at http://localhost:3000
- Open
http://localhost:3000in your browser - Select your preferred language (or use auto-detect)
- Choose between:
- Local File: Upload an audio/video file from your device
- Online URL: Paste a YouTube URL
- Click "Try FREE transcription"
- Wait for the transcription to complete
- View your results:
- Full transcription text
- AI-generated summary (3 key bullet points)
- Actions available:
- Copy to Clipboard: Copy the full transcription
- Save as Text: Download the transcription as a .txt file
- Clear: Start a new transcription
The application supports 10+ languages with auto-detection:
- English
- Portuguese (Português)
- Spanish (Español)
- French (Français)
- German (Deutsch)
- Italian (Italiano)
- Japanese (日本語)
- Korean (한국어)
- Chinese (中文)
- Russian (Русский)
speech-flow-ai/
├── backend/
│ ├── app/
│ │ ├── main.py # FastAPI application with all API endpoints
│ │ ├── ffmpeg.exe # FFmpeg executable (Windows)
│ │ ├── ffprobe.exe # FFprobe executable (Windows)
│ │ └── temp_audio/ # Temporary audio storage (auto-created)
│ ├── requirements.txt # Python dependencies
│ ├── .env # Environment variables (OpenAI API key)
│ └── venv/ # Python virtual environment
├── frontend/
│ ├── src/
│ │ ├── app/
│ │ │ ├── page.tsx # Main page with transcription UI
│ │ │ ├── layout.tsx # Root layout
│ │ │ └── globals.css # Global styles (dark theme)
│ │ └── ...
│ ├── package.json
│ └── ...
└── README.md
-
POST /api/transcribe
- Accepts: YouTube URL, optional language parameter
- Returns: Transcription text and ID
- Used for: Transcribing YouTube videos
-
POST /api/transcribe-file
- Accepts: Audio/video file upload, optional language parameter
- Returns: Transcription text and ID
- Used for: Transcribing local files
- Supports: Multiple audio/video formats (mp3, wav, m4a, mp4, etc.)
- POST /api/summarize
- Accepts: Transcription text, optional language parameter
- Returns: 3 key bullet points summarizing the content
- Requires: OpenAI API key configured
- Uses: GPT-3.5-turbo for intelligent summarization
- GET /api/transcription/{id}
- Get transcription status and result
- Returns: Status and transcription text if available
- The application uses FFmpeg for audio processing, which is included in the backend/app directory
- Transcriptions are processed locally using Whisper AI
- Temporary audio files are automatically cleaned up after processing
- The application supports multiple languages (auto-detected by Whisper)
MIT License