Skip to content

A web-based platform for generating videos of speaking avatars

License

Notifications You must be signed in to change notification settings

isislab-unisa/tts-avatar-video-gen

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

49 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

DUBME - Cloud-Native 3D Avatar Multimedia Authoring Platform

License: MIT with TTS Exception Next.js Go Fiber Python MongoDB MinIO

A Full-Stack Platform for Creating Video Content Using 3D Avatars with TTS and Lip-Sync

Features โ€ข Architecture โ€ข Quick Start โ€ข Documentation โ€ข Contributing โ€ข License


๐ŸŽฏ Overview

DUBME is an open-source, cloud-native platform designed for creating professional multimedia content with 3D avatars. Users can generate personalized video presentations by combining:

  • Text Input โ†’ Converted to natural-sounding speech
  • 3D Avatar Model โ†’ Animated with generated audio
  • Video Output โ†’ Professional MP4 video file

You can also manage projects and folders in one environment.

Perfect for:

  • E-learning content creation
  • Marketing videos
  • Training materials
  • Accessibility applications
  • Custom avatar presentations

โœจ Features

๐Ÿ” Authentication & Authorization

  • Email/Password registration and login
  • OAuth 2.0 integration (Google, GitHub)
  • JWT-based token authentication
  • Email verification and password recovery

๐ŸŽจ Content Management

  • Project Management: Create, organize, and manage projects
  • Directory Structure: Hierarchical organization of content
  • Batch Operations: Rename, move, and delete items

๐Ÿ—ฃ๏ธ Video Generation

  • Text-to-Speech (TTS): Convert text to natural audio
  • 3D Avatar Animation: Lip-sync animation with avatar model
  • Video Output: Direct MP4 download and preview

๐ŸŒ Internationalization (i18n)

  • Supports 4 languages: English (EN), Spanish (ES), French (FR), Italian (IT)
  • Language switching on the fly
  • Localized UI and error messages

๐Ÿ’พ Storage Management

  • MinIO S3-Compatible Storage: Scalable file storage
  • MongoDB: Persistent data storage

๐Ÿ”’ Security

  • JWT token-based API authentication
  • CORS protection
  • Rate limiting
  • Secure OAuth flows
  • Environment variable-based configuration

๐ŸŒ™ UI/UX

  • Dark mode and light mode support
  • Responsive design (mobile, tablet, desktop)
  • Accessible components (WCAG 2.1)
  • Intuitive dashboard
  • Real-time notifications

๐Ÿ—๏ธ Architecture

DUBME uses a three-tier containerized architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Frontend Layer    โ”‚  Next.js + TypeScript
โ”‚  (UI Components,    โ”‚  Better Auth, next-intl
โ”‚  OAuth, JWT Auth)   โ”‚  shadcn/ui + Aceternity
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚ REST API + JWT
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Backend Layer     โ”‚  Go + Fiber Framework
โ”‚  (API Server,       โ”‚  Request Handlers
โ”‚  JWT Validation)    โ”‚  Email Service
โ”‚                     โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
           โ”‚ HTTP Requests
           โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
           โ”‚                  โ”‚                  โ”‚
    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”   โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ–ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
    โ”‚   MongoDB  โ”‚   โ”‚     MinIO     โ”‚   โ”‚ Flask Server  โ”‚
    โ”‚ (Database) โ”‚   โ”‚ (S3 Storage)  โ”‚   โ”‚ (Python 7001) โ”‚
    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜   โ”‚ - TTS Engine  โ”‚
                                         โ”‚ - Video Gen   โ”‚
                                         โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

For detailed architecture documentation, see ARCHITECTURE.md.


๐Ÿš€ Quick Start

Prerequisites

  • Docker & Docker Compose (v20+)
  • Python 3.11+ (for Flask server)
  • Node.js 18+ (for local frontend development)
  • Git with Git LFS

1๏ธโƒฃ Clone the Repository

git clone https://github.com/Antonio-Caiazzo/DUBME.git
cd DUBME

2๏ธโƒฃ Setup Git LFS (for large files)

Large media files and Unity binaries are tracked with Git LFS:

# Install Git LFS (if not already installed)
brew install git-lfs  # macOS
# or visit https://git-lfs.github.com for other platforms

# Initialize Git LFS in the repository
git lfs install
git lfs pull

3๏ธโƒฃ Configure Environment Variables

Copy the example environment file and update with your credentials:

cp .env.example .env

See Environment Variables section below.

4๏ธโƒฃ Start Docker Services (MongoDB, MinIO, Backend, Frontend)

docker-compose up -d

5๏ธโƒฃ Start Python Flask Server

In a separate terminal:

cd generator
python3 -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -r tts/requirements.txt
python server.py

The Flask server will start on http://localhost:7001.

6๏ธโƒฃ Access the Application

7๏ธโƒฃ Development Mode (Optional)

For development without authentication:

# Already enabled in .env.example with DEV_MODE=true
# This bypasses login and uses a test user automatically

๐Ÿ“‹ Environment Variables

Frontend Configuration

# Better Auth Setup
BETTER_AUTH_SECRET=your_better_auth_secret_here
  # Purpose: Secret key for Better Auth encryption
  # Generate: openssl rand -base64 32

BETTER_AUTH_URL=http://localhost:3000
  # Purpose: URL where Better Auth callbacks are handled
  # Format: http://domain.com (no trailing slash)

NEXT_PUBLIC_BETTER_AUTH_URL=http://localhost:3000
  # Purpose: Public URL for frontend auth client (can be exposed to browser)
  # Must be same as BETTER_AUTH_URL

EMAIL_VERIFICATION_CALLBACK_URL=http://localhost:3000/email-verified
  # Purpose: URL user is redirected to after email verification
  # Format: Full URL to verification success page

MONGODB_URI=mongodb://localhost:27017/dubme
  # Purpose: MongoDB connection string (used by Better Auth for user storage)
  # Format: mongodb://host:port/database or connection string with auth

# OAuth Providers (Optional)
GOOGLE_CLIENT_ID=your_google_client_id
  # Get from: https://console.cloud.google.com

GOOGLE_CLIENT_SECRET=your_google_client_secret
  # Get from: https://console.cloud.google.com

GITHUB_CLIENT_ID=your_github_client_id
  # Get from: https://github.com/settings/developers

GITHUB_CLIENT_SECRET=your_github_client_secret
  # Get from: https://github.com/settings/developers

# Email Configuration (Optional - for email verification)
GMAIL_USER=your_email@gmail.com
  # Purpose: Gmail account for sending verification emails
  # Note: Use app-specific password, not main password

GMAIL_PASS=your_app_password
  # Generate: https://support.google.com/accounts/answer/185833

# Backend API URLs
NEXT_PUBLIC_BACKEND_API_URL=http://localhost:4000
  # Purpose: Backend API URL exposed to browser
  # Format: http://domain:port (no trailing slash)

BACKEND_API_URL=http://localhost:4000
  # Purpose: Backend API URL for server-side requests (Next.js middleware)
  # Can differ from public URL in production

API_JWT_SECRET=your_jwt_secret_here
  # Purpose: Secret for JWT token validation
  # Must match backend JWT_SECRET
  # Generate: openssl rand -base64 32

Backend Configuration (Go + Fiber)

# Server Port
PORT=4000
  # Purpose: Internal port where Go API server listens
  # In Docker: internal port (exposed via docker-compose)

# MongoDB Configuration
MONGO_URI=mongodb://localhost:27017
  # Purpose: MongoDB connection endpoint
  # In Docker: Use mongodb://mongo:27017 (service name)

MONGO_DB=dubme
  # Purpose: Database name for projects, directories, metadata
  # Convention: lowercase, no spaces

# CORS Configuration
CORS_ORIGINS=http://localhost:3000,http://frontend:3000
  # Purpose: Allowed origins for cross-origin requests
  # In Docker: Use http://frontend:3000 (service name)
  # Comma-separated list of allowed URLs

# JWT Secret
API_JWT_SECRET=your_jwt_secret_here
  # Purpose: Secret for JWT token validation
  # Must match NEXT API_JWT_SECRET
  # Generate: openssl rand -base64 32

# Test Video Asset (Temporary)
GENERATOR_TEST_MP4=assets/test.mp4
  # Purpose: Fallback video if Flask server is unavailable
  # Used for: Testing, development, error scenarios
  # File location: relative to backend/ directory

# Flask/Python Generator Service
GENERATOR_URL=http://generator:7001
  # Purpose: URL to reach Flask server
  # In Docker: http://generator:7001 (service name)
  # Locally: http://host.docker.internal:7001
  # Alternative: http://localhost:7001 (if Flask on same network)

MinIO Configuration

MINIO_ENDPOINT=localhost:9000
  # Purpose: MinIO server address
  # Format: host:port (no http://)
  # In Docker: Use minio:9000 (service name)

MINIO_ROOT_USER=minioadmin
  # Purpose: MinIO admin username
  # Default: minioadmin

MINIO_ROOT_PASSWORD=minioadmin
  # Purpose: MinIO admin password
  # Default: minioadmin
  # Change in production!

MINIO_BUCKET=dubme
  # Purpose: S3 bucket name for storing generated videos
  # Convention: lowercase, no special chars

MINIO_USE_SSL=false
  # Purpose: Whether to use HTTPS for MinIO
  # Default: false for local development
  # Set to: true in production

MINIO_PUBLIC_URL=http://localhost:9000
  # Purpose: Public URL for accessing files
  # Used for: Direct file links, CDN configuration

Development Mode

DEV_MODE=true
  # Purpose: Enable development mode (bypasses authentication)
  # Values: true or false
  # When true:
  #   - No login required
  #   - All API endpoints accessible
  #   - Auto-login with dev user
  # When false:
  #   - Normal authentication required
  #   - JWT validation enforced

DEV_USER_ID=dev-user-local
  # Purpose: User ID for development mode
  # Only used when DEV_MODE=true
  # Can be any identifier

Difference: Development Mode vs Production Mode

Aspect DEV_MODE=true DEV_MODE=false
Authentication Bypassed Required
Login Page Skipped Visible
User Context Auto dev-user From JWT token
Email Verification Skipped Required
API Access Unrestricted Token-based

Use DEV_MODE=true only for local development. Always set DEV_MODE=false in production.


๐Ÿ“ฆ Installation Guide for Full Local Development (No Docker)

# Backend
cd backend
go build
./backend

# Frontend (new terminal)
cd frontend
npm install
npm run dev

# Generator (new terminal)
cd generator
python3 -m venv .venv
source .venv/bin/activate
pip install -r tts/requirements.txt
python server.py

๐ŸŽฎ Usage Guide

Creating Your First Video

  1. Log In / Sign Up

    • Use email or OAuth providers (Google, GitHub)
    • In dev mode, automatically logged in
  2. Create a Project

    • Click "New Project"
    • Enter project name and description
    • Save
  3. Add Video Content

    • Within project, click "Generate Video"
    • Enter text to convert to speech
    • Choose avatar
    • Click "Generate"
    • Wait for video generation
  4. Download or Preview

    • Preview video in player
    • Download as MP4
    • Save to storage

๐Ÿ”ง Configuration & Customization

Changing Languages

Supported languages: English, Spanish, French, Italian

Frontend language files: /frontend/messages/

messages/
โ”œโ”€โ”€ en.json  (English)
โ”œโ”€โ”€ es.json  (Spanish)
โ”œโ”€โ”€ fr.json  (French)
โ””โ”€โ”€ it.json  (Italian)

Add new language:

  1. Create messages/[lang].json
  2. Update frontend/i18n/request.ts
  3. Add language option in language switcher component

Video Generator Avatar Model

The 3D avatar model is configured in:

  • macOS: /generator/TestMac.app/ (Unity binary)
  • Windows: /generator/stv-win/VideoGenerator.exe

Scaling Video Generation

For production workloads, Flask server can be:

  • Replicated: Multiple Flask instances with load balancer
  • Dedicated Server: Run Flask on separate machine/container
  • Kubernetes: Deploy Flask as separate deployment

Update GENERATOR_URL to point to load balancer or service endpoint.


๐Ÿ“š API Documentation

Authentication Endpoints

POST /api/auth/register

{
  "email": "user@example.com",
  "password": "secure_password",
  "name": "User Name"
}

POST /api/auth/login

{
  "email": "user@example.com",
  "password": "secure_password"
}

POST /api/auth/logout

  • Requires: JWT token in Authorization header

Project Endpoints

GET /api/projects

  • Fetch all projects for authenticated user
  • Headers: Authorization: Bearer <JWT_TOKEN>

POST /api/projects

{
  "name": "My First Video",
  "description": "Project description"
}

GET /api/projects/:id

  • Fetch specific project with metadata

PUT /api/projects/:id

{
  "name": "Updated Name",
  "description": "Updated description"
}

DELETE /api/projects/:id

  • Delete a project

Video Generation Endpoints

POST /api/generate

{
  "text": "Hello, I am an AI avatar",
  "avatar": "male",
  "title": "My First Video",
  "bgColor": "#ffffff"
}

Response:

  • Returns: MP4 video stream
  • Header: X-Generator-Output contains generated video path

POST /api/generate/cleanup

{
  "path": "generated/my_video.mp4"
}

๐Ÿงช Testing

Unit Tests

To Do

Integration Tests

To Do

System Tests

To Do

Manual Testing

  1. Start all services
  2. Navigate to http://localhost:3000
  3. Test workflow:
    • Sign up โ†’ Email verification โ†’ Create project โ†’ Generate video โ†’ Download

๐Ÿ“– Documentation


๐Ÿค Contributing

This is an open-source project welcoming contributions!

See CONTRIBUTING.md for:

  • Code of Conduct
  • How to contribute
  • Development setup
  • Pull request process

โš–๏ธ License & Important Notice

๐Ÿ“‹ Core License: MIT (Fully Open Source)

All source code and project resources are released under the MIT License, making DUBME fully open source and free for both personal and commercial use.

โœ… You CAN:

  • Use for personal, educational, and commercial projects
  • Use for research and development
  • Modify and fork the code
  • Include in commercial products
  • Use for for-profit business operations (โš ๏ธ ATTENTION: not in this first release, read below)
  • Share and distribute (with proper attribution)

โŒ You CANNOT:

  • Remove or modify license notices
  • Claim original authorship
  • Hold the authors liable

For full license details, see LICENSE file.


โš ๏ธ Important Exception: Coqui XXTS-v2 TTS Model

There is one exception to the MIT license:

The Coqui XXTS-v2 Text-to-Speech (TTS) model included in this project is distributed under the Coqui Public Model License 1.0.0, which restricts usage to non-commercial purposes only.

This means:

  • โœ… Allowed: Personal projects, education, non-commercial research, non-profits
  • โŒ NOT Allowed: Commercial use, revenue-generating services, for-profit operations

Anyone using the TTS model or its outputs must comply with the Coqui Public Model License.


๐Ÿš€ Future: Fully Open Source TTS

We are actively working to replace the Coqui XXTS-v2 model with an alternative open-source TTS solution that has no commercial restrictions. This effort aims to make DUBME completely open source under MIT license without exceptions.

Expected timeline: Upcoming releases will feature a fully unrestricted open-source TTS engine, eliminating this limitation.


๐Ÿ“Š Project Status

  • โœ… Core functionality: Stable
  • โœ… Authentication: Implemented
  • โœ… Video generation: Implemented
  • ๐Ÿšง New avatars: In development
  • ๐Ÿšง Linux support: Planned

๐Ÿ“ง Support & Contact

  • Issues: GitHub Issues (bug reports, feature requests)
  • Discussions: GitHub Discussions (Q&A, ideas)
  • Email: See repository profile

Made with โค๏ธ by the DUBME Community

Star us on GitHub โ€ข Report an Issue โ€ข Contributing

About

A web-based platform for generating videos of speaking avatars

Resources

License

Contributing

Stars

Watchers

Forks

Contributors 5