SpeechFlowAI - Your Ultimate AI Transcription Service

A modern web application that transcribes audio from YouTube videos and local files into text using AI. Built with TypeScript (Next.js) for the frontend and Python (FastAPI) for the backend.

Features

🎯 YouTube video transcription
📁 Local audio/video file upload
🤖 AI-powered summary generation (3 key bullet points)
📋 Copy to clipboard functionality
💾 Save transcriptions as text files
🌍 Multi-language support (10+ languages with auto-detection)
🎨 Modern dark theme UI
🔄 Real-time transcription status updates
🎵 Supports multiple audio/video formats
🔒 Secure and private processing

Tech Stack

Frontend

Next.js 14 (React)
TypeScript
Tailwind CSS
Lucide Icons

Backend

Python 3.11+
FastAPI
Whisper AI (for transcription)
yt-dlp (for YouTube downloads)
FFmpeg (for audio processing)

Prerequisites

Node.js 18+ and npm
Python 3.11+
FFmpeg (included in the backend/app directory for Windows; install via brew/apt on Mac/Linux)
OpenAI API key (optional, required only for AI summary generation feature)

Setup

Backend Setup

Create and activate a virtual environment:

cd backend
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate

Install dependencies:

pip install -r requirements.txt

(Optional) Configure OpenAI API key for summary feature:

# Create a .env file in the backend directory
echo "OPENAI_API_KEY=your_api_key_here" > .env

Start the backend server:

cd app
python main.py

The backend will be available at http://localhost:8000

Frontend Setup

Install dependencies:

cd frontend
npm install

Start the development server:

npm run dev

The frontend will be available at http://localhost:3000

Usage

Open http://localhost:3000 in your browser
Select your preferred language (or use auto-detect)
Choose between:
- Local File: Upload an audio/video file from your device
- Online URL: Paste a YouTube URL
Click "Try FREE transcription"
Wait for the transcription to complete
View your results:
- Full transcription text
- AI-generated summary (3 key bullet points)
Actions available:
- Copy to Clipboard: Copy the full transcription
- Save as Text: Download the transcription as a .txt file
- Clear: Start a new transcription

Supported Languages

The application supports 10+ languages with auto-detection:

English
Portuguese (Português)
Spanish (Español)
French (Français)
German (Deutsch)
Italian (Italiano)
Japanese (日本語)
Korean (한국어)
Chinese (中文)
Russian (Русский)

Project Structure

speech-flow-ai/
├── backend/
│   ├── app/
│   │   ├── main.py           # FastAPI application with all API endpoints
│   │   ├── ffmpeg.exe        # FFmpeg executable (Windows)
│   │   ├── ffprobe.exe       # FFprobe executable (Windows)
│   │   └── temp_audio/       # Temporary audio storage (auto-created)
│   ├── requirements.txt      # Python dependencies
│   ├── .env                  # Environment variables (OpenAI API key)
│   └── venv/                 # Python virtual environment
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   ├── page.tsx      # Main page with transcription UI
│   │   │   ├── layout.tsx    # Root layout
│   │   │   └── globals.css   # Global styles (dark theme)
│   │   └── ...
│   ├── package.json
│   └── ...
└── README.md

API Endpoints

Transcription Endpoints

POST /api/transcribe
- Accepts: YouTube URL, optional language parameter
- Returns: Transcription text and ID
- Used for: Transcribing YouTube videos
POST /api/transcribe-file
- Accepts: Audio/video file upload, optional language parameter
- Returns: Transcription text and ID
- Used for: Transcribing local files
- Supports: Multiple audio/video formats (mp3, wav, m4a, mp4, etc.)

Summary Endpoint

POST /api/summarize
- Accepts: Transcription text, optional language parameter
- Returns: 3 key bullet points summarizing the content
- Requires: OpenAI API key configured
- Uses: GPT-3.5-turbo for intelligent summarization

Status Endpoint

GET /api/transcription/{id}
- Get transcription status and result
- Returns: Status and transcription text if available

Notes

The application uses FFmpeg for audio processing, which is included in the backend/app directory
Transcriptions are processed locally using Whisper AI
Temporary audio files are automatically cleaned up after processing
The application supports multiple languages (auto-detected by Whisper)

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
assets		assets
backend		backend
frontend		frontend
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpeechFlowAI - Your Ultimate AI Transcription Service

Features

Tech Stack

Frontend

Backend

Prerequisites

Setup

Backend Setup

Frontend Setup

Usage

Supported Languages

Project Structure

API Endpoints

Transcription Endpoints

Summary Endpoint

Status Endpoint

Notes

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SpeechFlowAI - Your Ultimate AI Transcription Service

Features

Tech Stack

Frontend

Backend

Prerequisites

Setup

Backend Setup

Frontend Setup

Usage

Supported Languages

Project Structure

API Endpoints

Transcription Endpoints

Summary Endpoint

Status Endpoint

Notes

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages