Transform your text content into engaging AI-powered podcasts with natural voice synthesis
Auto Cast is a comprehensive, AI-driven platform that revolutionizes podcast creation by automating the entire workflowβfrom content summarization and script generation to high-quality audio production. Built with modern web technologies and powered by OpenAI's advanced language models.
- Smart Summarization: Automatically distill long-form content into concise, podcast-ready summaries
- Dynamic Script Generation: Create engaging podcast scripts with customizable styles and formats
- Iterative Refinement: Continuously improve scripts with AI-powered editing and enhancement
- Multi-format Support: Handle various input types including articles, blog posts, and documents
- English: Full support with native language models
- Persian (Farsi): Comprehensive Persian language support for Middle Eastern audiences
- Extensible: Architecture ready for additional language support
- Multiple Voice Options: Choose from 6 professional AI voices (Alloy, Echo, Fable, Onyx, Nova, Shimmer)
- Customizable Audio: Adjust speech speed, pitch, and tone to match your brand
- High-Quality Output: Generate broadcast-ready audio with natural intonation
- Responsive Design: Seamless experience across desktop, tablet, and mobile devices
- Dark/Light Mode: Toggle between themes with system preference detection
- Step-by-Step Workflow: Intuitive 4-step process from content input to audio generation
- Real-time Preview: Live script editing with rich text formatting
- Progress Tracking: Visual progress indicators during generation
- Multiple AI Providers: Support for OpenAI, AvalAI, OpenRouter, AWS Bedrock, and Azure OpenAI
- Custom Endpoints: Configure your own AI service endpoints
- Advanced Settings: Fine-tune temperature, max tokens, and system prompts
- Podcast Formats: Single or dual-host configurations with various styles
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β Frontend UI ββββββ API Routes ββββββ AI Services β
β (Next.js) β β (Next.js API) β β (OpenAI) β
βββββββββββββββββββ ββββββββββββββββββββ βββββββββββββββββββ
β β β
ββ Rich Text Editor ββ Content Generation ββ GPT Models
ββ Voice Preview ββ Text-to-Speech ββ TTS Engine
ββ Settings Modal ββ Endpoint Testing ββ Voice Models
ββ Theme Management ββ Audio Processing
- Next.js 15.2.4 - React framework with App Router
- TypeScript - Type-safe development
- Tailwind CSS - Utility-first CSS framework
- shadcn/ui - Beautiful, accessible components
- Vercel AI SDK - AI integration toolkit
- OpenAI API - Language models and TTS
- Radix UI - Headless UI primitives
- React Hook Form - Performant form handling
- Zod - Runtime type validation
- Lucide React - Beautiful icon library
- Node.js 18.0 or later
- npm or pnpm (recommended)
- OpenAI API Key - Get yours here
-
Clone the repository
git clone https://github.com/moaminsharifi/auto-cast.git cd auto-cast
-
Install dependencies
# Using npm npm install # Using pnpm (recommended) pnpm install
-
Start the development server
# Using npm npm run dev # Using pnpm pnpm dev
-
Open your browser Navigate to http://localhost:3000
- Configure API Key: Click the "Settings" button in the top-right corner
- Add OpenAI API Key: Enter your OpenAI API key and optionally save it locally
- Test Connection: Use the "Test Connection" feature to verify your setup
- Start Creating: Begin with the 4-step podcast generation workflow
- Paste Text: Directly input your content using the rich text editor
- Upload Files: Support for
.txt
and.md
files up to 10MB - Language Selection: Choose between English and Persian
- Duration: Set target length (5-30 minutes)
- Host Format: Single or dual-host conversations
- Style: Conversational, educational, storytelling, or interview
- Extras: Optional intro, outro, and background music
- Voice Selection: Choose from 6 professional AI voices
- Audio Tuning: Adjust speech speed (0.5x-2.0x) and pitch (0.8x-1.2x)
- Preview: Listen to voice samples before generation
- Review Settings: Final confirmation of all parameters
- Real-time Progress: Watch script generation in real-time
- Audio Production: Automatic conversion to high-quality MP3
- Download: Get both script and audio files
Create a .env.local
file in the root directory:
# Optional: Set default OpenAI API key (users can override in UI)
OPENAI_API_KEY=your_openai_api_key_here
# Optional: Custom API endpoint
CUSTOM_AI_ENDPOINT=https://your-custom-endpoint.com/v1
Access advanced configuration through the Settings modal:
- Model Selection: Choose from GPT-4, GPT-3.5-turbo, and other available models
- Temperature: Control creativity (0.0-1.0)
- Max Tokens: Set maximum response length
- System Prompts: Customize AI behavior with predefined or custom prompts
- Custom Endpoints: Configure alternative AI service providers
# Build the application
npm run build
# Start production server
npm start
For static export (GitHub Pages, etc.):
# Build and export
npm run build
# The 'out' directory contains the static files
auto-cast/
βββ app/ # Next.js App Router
β βββ api/ # API routes
β β βββ generate-podcast/ # Main generation endpoint
β β βββ test-endpoint/ # Connection testing
β β βββ voice-sample/ # Voice preview
β βββ globals.css # Global styles
β βββ layout.tsx # Root layout
β βββ page.tsx # Main application
βββ components/ # React components
β βββ ui/ # shadcn/ui components
β βββ rich-text-editor.tsx
β βββ settings-modal.tsx
β βββ theme-provider.tsx
β βββ theme-toggle.tsx
β βββ voice-sample-player.tsx
βββ hooks/ # Custom React hooks
βββ lib/ # Utility functions
βββ public/ # Static assets
βββ styles/ # Additional styles
app/page.tsx
: Main application with 4-step workflowcomponents/settings-modal.tsx
: Configuration interfacecomponents/rich-text-editor.tsx
: Content input with formattingapp/api/generate-podcast/route.ts
: Core generation logiccomponents/voice-sample-player.tsx
: Voice preview functionality
- New AI Provider: Extend the API endpoints configuration
- Additional Languages: Add language options and prompt templates
- Voice Options: Integrate new TTS providers
- Export Formats: Add new audio/video output formats
- Local Storage: API keys can be stored locally in browser (optional)
- No Server Storage: No sensitive data is stored on the server
- HTTPS Only: All AI API communications use secure connections
- Client-Side Processing: Content processing happens client-side when possible
We welcome contributions! Please see our Contributing Guidelines for details.
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Install dependencies (
pnpm install
) - Make your changes
- Test your changes (
npm run build
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- TypeScript: Strict type checking enabled
- ESLint: Configured for Next.js and React
- Prettier: Automatic code formatting
- Conventional Commits: Use conventional commit messages
- Video Podcast Generation: Support for video content with AI avatars
- Batch Processing: Handle multiple documents simultaneously
- Advanced Voice Cloning: Custom voice training capabilities
- Podcast Analytics: Usage statistics and performance metrics
- Team Collaboration: Multi-user workspaces and sharing
- API Access: RESTful API for external integrations
- Mobile App: Native iOS and Android applications
- Documentation: Check the Wiki for detailed guides
- Issues: Report bugs or request features via GitHub Issues
- Discussions: Join the community in GitHub Discussions
This project is licensed under the MIT License - see the LICENSE file for details.
- OpenAI for providing advanced language models and TTS capabilities
- Vercel for the excellent AI SDK and hosting platform
- shadcn for the beautiful UI component library
- Radix UI for accessible headless components
- Next.js Team for the amazing React framework
Built with β€οΈ by Amin Sharifi
β Star this project if you find it useful!