Skip to content

alexandrerays/speech-flow-ai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

15 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SpeechFlowAI - Your Ultimate AI Transcription Service

A modern web application that transcribes audio from YouTube videos and local files into text using AI. Built with TypeScript (Next.js) for the frontend and Python (FastAPI) for the backend.

Features

  • 🎯 YouTube video transcription
  • 📁 Local audio/video file upload
  • 🤖 AI-powered summary generation (3 key bullet points)
  • 📋 Copy to clipboard functionality
  • 💾 Save transcriptions as text files
  • 🌍 Multi-language support (10+ languages with auto-detection)
  • 🎨 Modern dark theme UI
  • 🔄 Real-time transcription status updates
  • 🎵 Supports multiple audio/video formats
  • 🔒 Secure and private processing

Tech Stack

Frontend

  • Next.js 14 (React)
  • TypeScript
  • Tailwind CSS
  • Lucide Icons

Backend

  • Python 3.11+
  • FastAPI
  • Whisper AI (for transcription)
  • yt-dlp (for YouTube downloads)
  • FFmpeg (for audio processing)

Prerequisites

  • Node.js 18+ and npm
  • Python 3.11+
  • FFmpeg (included in the backend/app directory for Windows; install via brew/apt on Mac/Linux)
  • OpenAI API key (optional, required only for AI summary generation feature)

Setup

Backend Setup

  1. Create and activate a virtual environment:
cd backend
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On Unix or MacOS:
source venv/bin/activate
  1. Install dependencies:
pip install -r requirements.txt
  1. (Optional) Configure OpenAI API key for summary feature:
# Create a .env file in the backend directory
echo "OPENAI_API_KEY=your_api_key_here" > .env
  1. Start the backend server:
cd app
python main.py

The backend will be available at http://localhost:8000

Frontend Setup

  1. Install dependencies:
cd frontend
npm install
  1. Start the development server:
npm run dev

The frontend will be available at http://localhost:3000

Usage

  1. Open http://localhost:3000 in your browser
  2. Select your preferred language (or use auto-detect)
  3. Choose between:
    • Local File: Upload an audio/video file from your device
    • Online URL: Paste a YouTube URL
  4. Click "Try FREE transcription"
  5. Wait for the transcription to complete
  6. View your results:
    • Full transcription text
    • AI-generated summary (3 key bullet points)
  7. Actions available:
    • Copy to Clipboard: Copy the full transcription
    • Save as Text: Download the transcription as a .txt file
    • Clear: Start a new transcription

Supported Languages

The application supports 10+ languages with auto-detection:

  • English
  • Portuguese (Português)
  • Spanish (Español)
  • French (Français)
  • German (Deutsch)
  • Italian (Italiano)
  • Japanese (日本語)
  • Korean (한국어)
  • Chinese (中文)
  • Russian (Русский)

Project Structure

speech-flow-ai/
├── backend/
│   ├── app/
│   │   ├── main.py           # FastAPI application with all API endpoints
│   │   ├── ffmpeg.exe        # FFmpeg executable (Windows)
│   │   ├── ffprobe.exe       # FFprobe executable (Windows)
│   │   └── temp_audio/       # Temporary audio storage (auto-created)
│   ├── requirements.txt      # Python dependencies
│   ├── .env                  # Environment variables (OpenAI API key)
│   └── venv/                 # Python virtual environment
├── frontend/
│   ├── src/
│   │   ├── app/
│   │   │   ├── page.tsx      # Main page with transcription UI
│   │   │   ├── layout.tsx    # Root layout
│   │   │   └── globals.css   # Global styles (dark theme)
│   │   └── ...
│   ├── package.json
│   └── ...
└── README.md

API Endpoints

Transcription Endpoints

  • POST /api/transcribe

    • Accepts: YouTube URL, optional language parameter
    • Returns: Transcription text and ID
    • Used for: Transcribing YouTube videos
  • POST /api/transcribe-file

    • Accepts: Audio/video file upload, optional language parameter
    • Returns: Transcription text and ID
    • Used for: Transcribing local files
    • Supports: Multiple audio/video formats (mp3, wav, m4a, mp4, etc.)

Summary Endpoint

  • POST /api/summarize
    • Accepts: Transcription text, optional language parameter
    • Returns: 3 key bullet points summarizing the content
    • Requires: OpenAI API key configured
    • Uses: GPT-3.5-turbo for intelligent summarization

Status Endpoint

  • GET /api/transcription/{id}
    • Get transcription status and result
    • Returns: Status and transcription text if available

Notes

  • The application uses FFmpeg for audio processing, which is included in the backend/app directory
  • Transcriptions are processed locally using Whisper AI
  • Temporary audio files are automatically cleaned up after processing
  • The application supports multiple languages (auto-detected by Whisper)

License

MIT License

About

A modern web application that transcribes audio from YouTube videos and local files into text using AI. Built with TypeScript (Next.js) for the frontend and Python (FastAPI) for the backend.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors