Vocabuilder - AI-Powered Vocabulary Builder

A modern web application for building vocabulary with AI-powered word analysis, learning features, and intelligent story generation. Built with cutting-edge LLM technology and modern web frameworks.

🚀 Features

Core Vocabulary Management

Word Management: Add, organize, and track vocabulary words with learning progress
List Organization: Create custom lists to categorize words by topic, difficulty, or learning goals
Learning Progress: Mark words as learned/unlearned with visual progress tracking
User Authentication: Secure JWT-based registration and login system

AI-Powered Learning Features

Multi-Model AI Analysis: Choose between OpenAI GPT-4 and Google Gemini for word information
Intelligent Word Validation: AI-powered validation to ensure only real English words are added
GRE-Focused Learning: Prioritizes advanced, GRE-relevant meanings and contexts
Contextual Story Generation: Create engaging stories using your vocabulary words in proper context
Semantic Similarity: Find similar words using vector embeddings and semantic search

Advanced AI Capabilities

LangChain Integration: Modern prompt engineering with structured output parsing
Vector Database: ChromaDB-powered semantic search for finding related words
Embedding Generation: OpenAI embeddings for advanced word similarity analysis
Prompt Engineering: Specialized prompts for vocabulary learning and story generation

🧠 LLM Concepts & AI Architecture

Language Model Integration

OpenAI GPT-4o-mini: High-quality text generation with JSON output parsing
Google Gemini 2.0 Flash: Fast, efficient AI model for vocabulary analysis
Model Selection: Dynamic switching between AI models based on user preference
Temperature Control: Optimized temperature (0.7) for consistent, creative outputs

Prompt Engineering & Chain Architecture

Structured Prompts: ChatPromptTemplate-based prompts for consistent AI responses
GRE-Focused Prompts: Specialized prompts that prioritize advanced vocabulary meanings
Context-Aware Story Generation: Prompts that ensure vocabulary words are used in proper GRE context
LangChain Chains: Efficient prompt → LLM → parser pipelines for structured outputs

Vector Search & Semantic Understanding

Word Embeddings: OpenAI text-embedding-3-small for semantic word representation
ChromaDB Integration: Vector database for similarity search and word relationships
Semantic Similarity: Find related words based on meaning, not just spelling
Context-Aware Search: Search within specific word lists for targeted learning

🛠️ Tech Stack

Frontend

React 18 - Modern UI framework with hooks and concurrent features
TypeScript - Full type safety and enhanced developer experience
Vite - Lightning-fast build tool and development server
Tailwind CSS - Utility-first CSS framework for responsive design
ShadCN UI - Beautiful, accessible component library
TanStack Query - Efficient data fetching, caching, and synchronization
React Router - Client-side routing with protected routes
Lucide React - Beautiful, consistent icon library

Backend

FastAPI - Modern, fast Python web framework with automatic API documentation
SQLModel - SQL database integration with Pydantic models
PostgreSQL - Robust, scalable relational database
Alembic - Database migration management
JWT Authentication - Secure token-based authentication system
CORS Middleware - Cross-origin resource sharing support

AI & Machine Learning

LangChain Core - Framework for building LLM applications
OpenAI API - GPT-4o-mini for text generation and embeddings
Google Generative AI - Gemini 2.0 Flash integration
ChromaDB - Vector database for semantic search
NumPy - Numerical computing for vector operations

Development & Deployment

UV - Fast Python package manager and environment management
Docker - Containerization for PostgreSQL and development
Render - Cloud deployment platform
Git - Version control with GitHub integration

📦 Quick Start

Prerequisites

Python 3.9+ - Backend runtime
Node.js 18+ - Frontend runtime
PostgreSQL 12+ - Database server
Docker - Optional, for easy database setup

1. Database Setup

Option A: Docker (Recommended)

# Create PostgreSQL container
docker run --name vocabuilder-postgres \
  -e POSTGRES_PASSWORD=postgres \
  -e POSTGRES_DB=vocabuilder \
  -p 15432:5432 -d postgres:15

# Optional: Add Adminer for database management
docker run --name adminer \
  --network host -d adminer
# Access at http://localhost:8080

Option B: Local PostgreSQL

# Install PostgreSQL (Ubuntu/Debian)
sudo apt install postgresql postgresql-contrib

# Create database
sudo -u postgres createdb vocabuilder

# Or on macOS with Homebrew
brew install postgresql
brew services start postgresql
createdb vocabuilder

2. Backend Setup

# Navigate to backend directory
cd Backend

# Install UV (Python package manager)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Install dependencies
uv sync

# Create environment file
cp .env.example .env
# Edit .env with your configuration

# Apply database migrations
./scripts/migrations/db.sh migrate apply

# Start development server
uv run uvicorn main:app --reload

3. Frontend Setup

# Navigate to frontend directory
cd Frontend

# Install dependencies
npm install

# Create environment file
cp .env.example .env
# Edit .env with your configuration

# Start development server
npm run dev

4. Access Your Application

Frontend: http://localhost:5173
Backend API: http://localhost:8000
API Documentation: http://localhost:8000/docs
Database UI: http://localhost:8080 (if using Adminer)

🔧 Environment Variables

Backend (.env)

# Database
DATABASE_URL=postgresql://postgres:postgres@localhost:15432/vocabuilder

# Authentication
JWT_SECRET=your-super-secret-jwt-key-here

# AI APIs
OPENAI_API_KEY=your-openai-api-key
GEMINI_API_KEY=your-google-gemini-api-key

# Server
PORT=8000

Frontend (.env)

# API Configuration
VITE_API_URL=http://localhost:8000

# Google OAuth (optional)
VITE_GOOGLE_CLIENT_ID=your-google-oauth-client-id

🗄️ Database Schema

Core Tables & Relationships

The Vocabuilder database uses a normalized structure with four main tables and their relationships:

┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│      User       │    │      List       │    │      Word       │
│                 │    │                 │    │                 │
│ id (PK)         │◄───┤ user_id (FK)    │    │ id (PK)         │
│ username        │    │ id (PK)         │◄───┤ list_id (FK)    │
│ email           │    │ name            │    │ user_id (FK)    │
│ hashed_password │    │ description     │    │ dictionary_id   │
│ google_id       │    │ created_at      │    │ learned         │
│ is_active       │    │ updated_at      │    │                 │
│ created_at      │    └─────────────────┘    └─────────────────┘
│ updated_at      │                                    │
└─────────────────┘                                    │
         │                                              │
         │                                              │
         └──────────────────────────────────────────────┘
                                                       │
                                                       ▼
                                              ┌─────────────────┐
                                              │   Dictionary    │
                                              │                 │
                                              │ id (PK)         │
                                              │ word            │
                                              │ synonyms        │
                                              │ antonyms        │
                                              │ meanings        │
                                              │ examples        │
                                              │ embeddings      │
                                              │ created_at      │
                                              │ updated_at      │
                                              └─────────────────┘

User: Authentication and user management List: Custom word categories for users
Word: Links users to dictionary entries with learning status Dictionary: Shared word information and AI-generated content

📱 Application Pages & Features

1. Home Dashboard (`/`)

Quick Add Word: Instant word addition with AI validation
Learning Progress: Visual overview of learned vs. unlearned words
Recent Activity: Track your vocabulary building journey
Quick Actions: Access to word generator and story creator

2. Word Generator (`/generator`)

AI-Powered Analysis: Generate comprehensive word information
Model Selection: Choose between OpenAI GPT and Google Gemini
GRE-Focused Results: Prioritizes advanced, test-relevant meanings
Real-time Generation: Live AI responses with loading states

3. Story Generator (`/story`)

Interactive Word Selection: Choose multiple words from your vocabulary
Context-Aware Stories: AI generates stories using words in proper GRE context
Learning-Focused Content: Simple language with sophisticated vocabulary usage
Word Meaning Explanations: Detailed breakdown of how each word was used

4. Word Lists (`/lists`)

Custom Organization: Create themed lists for focused learning
Progress Tracking: Monitor learning progress within each list
Bulk Operations: Manage multiple words efficiently
Similar Word Discovery: Find related words using semantic search

5. Word Management (`/`)

Comprehensive View: All your vocabulary words in one place
Learning Status: Mark words as learned/unlearned
List Assignment: Organize words into custom categories
Search & Filter: Find words quickly with advanced filtering

🔍 AI Features Deep Dive

Word Information Generation

# LangChain-based prompt engineering
WORD_INFO_PROMPT = ChatPromptTemplate.from_messages([
    ("system", "You are a specialized GRE vocabulary assistant..."),
    ("user", "VALIDATION: First, determine if '{word}' is a real English word...")
])

# Structured output parsing
chain = prompt | llm | JsonOutputParser()
result = await chain.ainvoke({"word": word})

Story Generation with Context

# Context-aware story creation
STORY_PROMPT = ChatPromptTemplate.from_messages([
    ("system", "You are a creative storyteller who helps people learn GRE vocabulary..."),
    ("user", "Create an engaging story using these vocabulary words: {words}...")
])

# No parser needed for creative text
chain = prompt | llm
result = await chain.ainvoke({"words": words})

Semantic Similarity Search

# Vector-based word similarity
class VectorService:
    def find_similar_words(self, query_string: str, top_n: int = 5):
        query_embedding = self.get_embedding(query_string)
        results = self.collection.query(
            query_embeddings=[query_embedding],
            n_results=top_n,
            include=["metadatas", "distances"]
        )
        return self.format_results(results)

🚀 Development Workflow

Backend Development

# Database migrations
./scripts/migrations/db.sh migrate generate "add new feature"
./scripts/migrations/db.sh migrate apply

# Code quality
uv run black .
uv run isort .
uv run flake8 .

# Testing
uv run pytest

# Development server
uv run uvicorn main:app --reload

Frontend Development

# Development server
npm run dev

# Build for production
npm run build

# Preview production build
npm run preview

# Code quality
npm run lint
npm run format

Database Management

# Open database shell
./scripts/migrations/db.sh shell

# Check migration status
./scripts/migrations/db.sh migrate status

# Reset database (development only)
./scripts/migrations/db.sh reset

🔄 Database Migrations & Schema Changes

Migration Workflow

When you make changes to database models, you need to generate and apply migrations:

cd Backend

# 1. Generate migration after changing models
./scripts/migrations/db.sh migrate generate "describe your changes"

# 2. Apply migration locally
./scripts/migrations/db.sh migrate apply

# 3. Test your changes
uv run uvicorn main:app --reload

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
Backend		Backend
Frontend		Frontend
README.md		README.md

jagonmoy/VocaBuilder

Folders and files

Latest commit

History

Repository files navigation