Skip to content

moaminsharifi/auto-cast

Repository files navigation

πŸŽ™οΈ Auto Cast

Transform your text content into engaging AI-powered podcasts with natural voice synthesis

Auto Cast is a comprehensive, AI-driven platform that revolutionizes podcast creation by automating the entire workflowβ€”from content summarization and script generation to high-quality audio production. Built with modern web technologies and powered by OpenAI's advanced language models.

Auto Cast Banner

✨ Features

πŸ€– AI-Powered Content Generation

  • Smart Summarization: Automatically distill long-form content into concise, podcast-ready summaries
  • Dynamic Script Generation: Create engaging podcast scripts with customizable styles and formats
  • Iterative Refinement: Continuously improve scripts with AI-powered editing and enhancement
  • Multi-format Support: Handle various input types including articles, blog posts, and documents

🌍 Multi-Language Support

  • English: Full support with native language models
  • Persian (Farsi): Comprehensive Persian language support for Middle Eastern audiences
  • Extensible: Architecture ready for additional language support

🎀 Advanced Text-to-Speech

  • Multiple Voice Options: Choose from 6 professional AI voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer)
  • Customizable Audio: Adjust speech speed, pitch, and tone to match your brand
  • High-Quality Output: Generate broadcast-ready audio with natural intonation

🎨 Modern User Interface

  • Responsive Design: Seamless experience across desktop, tablet, and mobile devices
  • Dark/Light Mode: Toggle between themes with system preference detection
  • Step-by-Step Workflow: Intuitive 4-step process from content input to audio generation
  • Real-time Preview: Live script editing with rich text formatting
  • Progress Tracking: Visual progress indicators during generation

βš™οΈ Flexible Configuration

  • Multiple AI Providers: Support for OpenAI, AvalAI, OpenRouter, AWS Bedrock, and Azure OpenAI
  • Custom Endpoints: Configure your own AI service endpoints
  • Advanced Settings: Fine-tune temperature, max tokens, and system prompts
  • Podcast Formats: Single or dual-host configurations with various styles

πŸ—οΈ Architecture Overview

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Frontend UI   │────│   API Routes     │────│   AI Services   β”‚
β”‚   (Next.js)     β”‚    β”‚   (Next.js API)  β”‚    β”‚   (OpenAI)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β”‚                       β”‚                       β”‚
         β”œβ”€ Rich Text Editor     β”œβ”€ Content Generation   β”œβ”€ GPT Models
         β”œβ”€ Voice Preview        β”œβ”€ Text-to-Speech       β”œβ”€ TTS Engine
         β”œβ”€ Settings Modal       β”œβ”€ Endpoint Testing     └─ Voice Models
         └─ Theme Management     └─ Audio Processing

πŸ› οΈ Tech Stack

Frontend

AI & Backend

Development Tools

πŸš€ Quick Start

Prerequisites

  • Node.js 18.0 or later
  • npm or pnpm (recommended)
  • OpenAI API Key - Get yours here

Installation

  1. Clone the repository

    git clone https://github.com/moaminsharifi/auto-cast.git
    cd auto-cast
  2. Install dependencies

    # Using npm
    npm install
    
    # Using pnpm (recommended)
    pnpm install
  3. Start the development server

    # Using npm
    npm run dev
    
    # Using pnpm
    pnpm dev
  4. Open your browser Navigate to http://localhost:3000

First-Time Setup

  1. Configure API Key: Click the "Settings" button in the top-right corner
  2. Add OpenAI API Key: Enter your OpenAI API key and optionally save it locally
  3. Test Connection: Use the "Test Connection" feature to verify your setup
  4. Start Creating: Begin with the 4-step podcast generation workflow

πŸ“– Usage Guide

Step 1: Content Input

  • Paste Text: Directly input your content using the rich text editor
  • Upload Files: Support for .txt and .md files up to 10MB
  • Language Selection: Choose between English and Persian

Step 2: Podcast Configuration

  • Duration: Set target length (5-30 minutes)
  • Host Format: Single or dual-host conversations
  • Style: Conversational, educational, storytelling, or interview
  • Extras: Optional intro, outro, and background music

Step 3: Voice Settings

  • Voice Selection: Choose from 6 professional AI voices
  • Audio Tuning: Adjust speech speed (0.5x-2.0x) and pitch (0.8x-1.2x)
  • Preview: Listen to voice samples before generation

Step 4: Generation & Download

  • Review Settings: Final confirmation of all parameters
  • Real-time Progress: Watch script generation in real-time
  • Audio Production: Automatic conversion to high-quality MP3
  • Download: Get both script and audio files

πŸ”§ Configuration

Environment Variables

Create a .env.local file in the root directory:

# Optional: Set default OpenAI API key (users can override in UI)
OPENAI_API_KEY=your_openai_api_key_here

# Optional: Custom API endpoint
CUSTOM_AI_ENDPOINT=https://your-custom-endpoint.com/v1

Advanced Settings

Access advanced configuration through the Settings modal:

  • Model Selection: Choose from GPT-4, GPT-3.5-turbo, and other available models
  • Temperature: Control creativity (0.0-1.0)
  • Max Tokens: Set maximum response length
  • System Prompts: Customize AI behavior with predefined or custom prompts
  • Custom Endpoints: Configure alternative AI service providers

πŸ—οΈ Building for Production

# Build the application
npm run build

# Start production server
npm start

For static export (GitHub Pages, etc.):

# Build and export
npm run build

# The 'out' directory contains the static files

πŸ§ͺ Development

Project Structure

auto-cast/
β”œβ”€β”€ app/                    # Next.js App Router
β”‚   β”œβ”€β”€ api/               # API routes
β”‚   β”‚   β”œβ”€β”€ generate-podcast/  # Main generation endpoint
β”‚   β”‚   β”œβ”€β”€ test-endpoint/     # Connection testing
β”‚   β”‚   └── voice-sample/      # Voice preview
β”‚   β”œβ”€β”€ globals.css        # Global styles
β”‚   β”œβ”€β”€ layout.tsx         # Root layout
β”‚   └── page.tsx          # Main application
β”œβ”€β”€ components/            # React components
β”‚   β”œβ”€β”€ ui/               # shadcn/ui components
β”‚   β”œβ”€β”€ rich-text-editor.tsx
β”‚   β”œβ”€β”€ settings-modal.tsx
β”‚   β”œβ”€β”€ theme-provider.tsx
β”‚   β”œβ”€β”€ theme-toggle.tsx
β”‚   └── voice-sample-player.tsx
β”œβ”€β”€ hooks/                # Custom React hooks
β”œβ”€β”€ lib/                  # Utility functions
β”œβ”€β”€ public/               # Static assets
└── styles/               # Additional styles

Key Components

  • app/page.tsx: Main application with 4-step workflow
  • components/settings-modal.tsx: Configuration interface
  • components/rich-text-editor.tsx: Content input with formatting
  • app/api/generate-podcast/route.ts: Core generation logic
  • components/voice-sample-player.tsx: Voice preview functionality

Adding New Features

  1. New AI Provider: Extend the API endpoints configuration
  2. Additional Languages: Add language options and prompt templates
  3. Voice Options: Integrate new TTS providers
  4. Export Formats: Add new audio/video output formats

πŸ”’ Privacy & Security

  • Local Storage: API keys can be stored locally in browser (optional)
  • No Server Storage: No sensitive data is stored on the server
  • HTTPS Only: All AI API communications use secure connections
  • Client-Side Processing: Content processing happens client-side when possible

🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for details.

Development Workflow

  1. Fork the repository
  2. Create a feature branch (git checkout -b feature/amazing-feature)
  3. Install dependencies (pnpm install)
  4. Make your changes
  5. Test your changes (npm run build)
  6. Commit your changes (git commit -m 'Add amazing feature')
  7. Push to the branch (git push origin feature/amazing-feature)
  8. Open a Pull Request

Code Style

  • TypeScript: Strict type checking enabled
  • ESLint: Configured for Next.js and React
  • Prettier: Automatic code formatting
  • Conventional Commits: Use conventional commit messages

πŸ“Š Roadmap

  • Video Podcast Generation: Support for video content with AI avatars
  • Batch Processing: Handle multiple documents simultaneously
  • Advanced Voice Cloning: Custom voice training capabilities
  • Podcast Analytics: Usage statistics and performance metrics
  • Team Collaboration: Multi-user workspaces and sharing
  • API Access: RESTful API for external integrations
  • Mobile App: Native iOS and Android applications

πŸ†˜ Support

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ™ Acknowledgments

  • OpenAI for providing advanced language models and TTS capabilities
  • Vercel for the excellent AI SDK and hosting platform
  • shadcn for the beautiful UI component library
  • Radix UI for accessible headless components
  • Next.js Team for the amazing React framework

Built with ❀️ by Amin Sharifi

⭐ Star this project if you find it useful!

Contributors 2

  •  
  •