VT

Multimodal AI Chat App with Dynamic Routing

VT.ai

VT.ai is a multimodal AI chat application designed to simplify interaction with different AI models through a unified interface. It employs vector-based semantic routing to direct queries to the most suitable model, eliminating the need to switch between multiple applications and interfaces.

Documentation

Key Features

Multi-Provider Integration: Unified access to models from OpenAI (o1/o3/4o), Anthropic (Claude), Google (Gemini), DeepSeek, Llama, Cohere, and local models via Ollama
Semantic Routing System: Vector-based classification automatically routes queries to appropriate models using FastEmbed embeddings, removing the need for manual model selection
Multimodal Capabilities: Comprehensive support for text, image, and audio inputs with advanced vision analysis
Image Generation: GPT-Image-1 integration with support for transparent backgrounds, multiple formats, and customizable quality parameters
Web Search Integration: Real-time information retrieval with source attribution via Tavily API
Voice Processing: Advanced speech-to-text and text-to-speech functionality with configurable voice options and silence detection
Reasoning Visualization: Step-by-step model reasoning visualization with the <think> tag for transparent AI decision processes

Installation & Setup

Multiple installation methods are available depending on requirements:

# Standard PyPI installation
uv pip install vtai

# Zero-installation experience with uvx
export OPENAI_API_KEY='your-key-here'
uvx vtai

# Development installation
git clone https://github.com/vinhnx/VT.ai.git
cd VT.ai
uv venv
source .venv/bin/activate  # Linux/Mac
uv pip install -e ".[dev]"  # Install with development dependencies

API Key Configuration

Configure API keys to enable specific model capabilities:

# Command-line configuration
vtai --api-key openai=sk-your-key-here

# Environment variable configuration
export OPENAI_API_KEY='sk-your-key-here'  # For OpenAI models
export ANTHROPIC_API_KEY='sk-ant-your-key-here'  # For Claude models
export GEMINI_API_KEY='your-key-here'  # For Gemini models

API keys are securely stored in ~/.config/vtai/.env for future use.

Usage Guide

Interface Usage

The application provides a clean, intuitive interface with the following capabilities:

Dynamic Conversations: The semantic router automatically selects the most appropriate model for each query
Image Generation: Create images using prompts like "generate an image of..." or "draw a..."
Visual Analysis: Upload or provide URLs to analyze visual content
Reasoning Visualization: Add <think> to prompts to observe step-by-step reasoning
Voice Interaction: Use the microphone feature for speech input and text-to-speech output

Detailed usage instructions are available in the Getting Started Guide.

Documentation

The documentation is organized into sections designed for different user needs:

User Guide: Installation, configuration, and feature documentation
Developer Guide: Architecture details, extension points, and implementation information
API Reference: Comprehensive API documentation for programmatic usage

Implementation Options

VT.ai offers two distinct implementations:

Python Implementation: Full-featured reference implementation with complete support for all capabilities
Rust Implementation: High-performance alternative with optimized memory usage and native compiled speed

The implementation documentation provides a detailed comparison of both options.

Supported Models

Category	Models
Chat	GPT-o1, GPT-o3 Mini, GPT-4o, Claude 3.5/3.7, Gemini 2.0/2.5
Vision	GPT-4o, Gemini 1.5 Pro/Flash, Claude 3, Llama3.2 Vision
Image Gen	GPT-Image-1 with custom parameters
TTS	GPT-4o mini TTS, TTS-1, TTS-1-HD
Local	Llama3, Mistral, DeepSeek R1 (1.5B to 70B via Ollama)

The Models Documentation provides detailed information about model-specific capabilities and configuration options.

Technical Architecture

VT.ai leverages several open-source projects to deliver its functionality:

Chainlit: Modern chat interface framework
LiteLLM: Unified model abstraction layer
SemanticRouter: Intent classification system
FastEmbed: Efficient embedding generation
Tavily: Web search capabilities

The application architecture follows a clean, modular design:

Entry Point: vtai/app.py - Main application logic
Routing Layer: vtai/router/ - Semantic classification system
Assistants: vtai/assistants/ - Specialized handlers for different query types
Tools: vtai/tools/ - Web search, file operations, and other integrations

Contributing

Contributions to VT.ai are welcome. The project accepts various types of contributions:

Bug Reports: Submit detailed GitHub issues for any bugs encountered
Feature Requests: Propose new functionality through GitHub issues
Pull Requests: Submit code improvements and bug fixes
Documentation: Enhance documentation or add examples
Feedback: Share user experiences to help improve the project

Development setup:

# Clone the repository
git clone https://github.com/vinhnx/VT.ai.git
cd VT.ai

# Set up development environment
uv venv
source .venv/bin/activate  # Linux/Mac
uv pip install -e ".[dev]"

chainlit run vtai/app

# Run tests
pytest

Testing and Quality

Quality is maintained through comprehensive testing:

# Run the test suite
pytest

# Run with coverage reporting
pytest --cov=vtai

# Run specific test categories
pytest tests/unit/
pytest tests/integration/

License

VT.ai is available under the MIT License - See LICENSE for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!