You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
"We are not another AI study tool — we are a 5-model ML decision engine that tells students exactly what to study when time is limited."
Prepzo is a Deadline-Aware Exam Preparation Engine powered by 5 machine learning models and a large language model. It figures out how much time a student has, uses ML to prioritize what matters most, auto-discovers topics from uploaded PDFs, generates a scientifically-scheduled study plan, and produces high-probability exam questions — all personalized to the student's exact context.
Features & USP
Deadline Mode Detection:
≤ 3 days → 🔥 Survival Mode (top 30% topics only)
4–7 days → ⚡ Balanced Mode (top 60% topics)
> 7 days → 📚 Full Mode (all topics with deep coverage)
Priority System:
🔴 Must Do
🟡 Should Do
⚪ Optional
5-Model ML Engine (the core USP):
ML Model 1 — TF-IDF + Cosine Similarity → Data-driven topic importance scoring from PDFs
ML Model 2 — Naive Bayes Classifier → Predicts question type (MCQ / Coding / Theory) per topic
ML Model 3 — K-Means Clustering → Auto-discovers topic groups from uploaded syllabus
ML Model 4 — SM-2 Spaced Repetition → Generates day-by-day study schedule based on memory science
ML Model 5 — Cosine Similarity Matrix → Detects repeating question patterns from past papers
AI Question Generation:
Groq API (LLaMA-3 70B) — Generates high-probability exam questions with detailed solutions, obeying ML predictions
AI Chatbot — Ask doubts, get explanations, request more questions
Analytics Dashboard — Platform usage tracking and insights
Tech Stack
Layer
Technology
Backend
FastAPI (Python)
AI / LLM
Groq API — LLaMA-3.3 70B (llama-3.3-70b-versatile)
Purpose: Detects repeating question patterns from uploaded past papers
How it works:
Extracts individual questions from PDF text using regex patterns
Builds TF-IDF matrix for all extracted questions
Computes pairwise cosine similarity matrix
Groups questions with similarity > 0.3 threshold
Classifies each pattern into categories (Implementation, Comparison, Explanation, etc.)
Output: Repeating patterns with frequency counts, exam probability, category breakdown, and topic correlations
Integration: Shown on InputPage after PDF upload so students see examiner patterns before generating a plan
Priority Scoring Formula
score = (ml_topic_weight × 0.6) + (ai_probability × 0.4)
score >= 0.7 → 🔴 Must Do
score >= 0.4 → 🟡 Should Do
score < 0.4 → ⚪ Optional
Project Structure
prepzo/
├── README.md
├── backend/
│ ├── main.py # FastAPI app entry point + CORS
│ ├── requirements.txt # Python dependencies
│ ├── .env.example # Environment variables template
│ └── app/
│ ├── models/
│ │ └── schemas.py # Pydantic models + Enums (all 5 ML outputs)
│ ├── routes/
│ │ ├── plan.py # POST /generate-plan (orchestrates all ML models)
│ │ ├── upload.py # POST /upload-pdf (clustering + pattern analysis)
│ │ ├── chat.py # POST /chat (context-aware AI chatbot)
│ │ ├── auth.py # POST /send-otp, /verify-otp
│ │ └── analytics.py # GET /analytics, POST /analytics/track
│ ├── services/
│ │ ├── ml_service.py # ML Model 1 (TF-IDF) + ML Model 2 (Naive Bayes)
│ │ ├── clustering_service.py # ML Model 3 (K-Means topic clustering)
│ │ ├── spaced_repetition_service.py # ML Model 4 (SM-2 scheduler)
│ │ ├── pattern_analyzer_service.py # ML Model 5 (cosine similarity patterns)
│ │ ├── groq_service.py # LLaMA-3 question generation + chatbot
│ │ └── deadline_service.py # Mode selection + Pareto + priority scoring
│ └── utils/
│ ├── pdf_parser.py # PDF text extraction + heuristic topic detection
│ └── store.py # In-memory analytics tracking
└── frontend/
├── vite.config.js # Vite config with /api proxy
└── src/
├── App.jsx # Router + page transitions
├── index.css # Design system + animations
├── services/
│ └── api.js # Axios API client
├── pages/
│ ├── InputPage.jsx # Student input form + ML panels
│ ├── ResultPage.jsx # Plan display + schedule + questions
│ ├── ChatPage.jsx # AI chatbot interface
│ └── AnalyticsPage.jsx # Usage analytics dashboard
├── components/
│ ├── Navbar.jsx # Navigation bar
│ ├── ModeBanner.jsx # Survival/Balanced/Full banner
│ ├── FilterBar.jsx # Question type/difficulty/priority filters
│ ├── QuestionCard.jsx # Individual question with solution toggle
│ ├── TopicChip.jsx # Removable topic tag
│ ├── TopicInsightsPanel.jsx # ML Model 2: Naive Bayes predictions
│ ├── ClusteringPanel.jsx # ML Model 3: K-Means topic groups
│ ├── StudySchedulePanel.jsx # ML Model 4: SM-2 daily schedule
│ └── PatternAnalysisPanel.jsx # ML Model 5: Repeating question patterns
└── three/
└── FloatingOrbs.jsx # Background 3D animation
How to Run
Backend
# 1. Navigate to backendcd backend
# 2. Create virtual environment
python -m venv venv
source venv/bin/activate # macOS/Linux# venv\Scripts\activate # Windows# 3. Install dependencies
pip install -r requirements.txt
# 4. Set up environment variables
cp .env.example .env
# Edit .env and add your GROQ_API_KEY# 5. Start the server
uvicorn main:app --reload
# Server runs at http://127.0.0.1:8000# Swagger docs at http://127.0.0.1:8000/docs
Frontend
# 1. Navigate to frontendcd frontend
# 2. Install dependencies
npm install
# 3. Start development server
npm run dev
# Frontend runs at http://localhost:5173
System Flow
Student Input (Subject, Date, Topics)
│
├── Upload PDF ──► pdfplumber extracts text
│ │
│ ├── ML Model 3: K-Means ──► Auto-discovered topic clusters
│ └── ML Model 5: Cosine Similarity ──► Repeating question patterns
│
▼
Deadline Engine ──► Mode: Survival / Balanced / Full
│
▼
ML Layer (scikit-learn)
├── ML Model 1: TF-IDF ──► Topic Importance Weights
└── ML Model 2: Naive Bayes ──► Question Type Predictions (MCQ/Coding/Theory)
│
▼
Groq LLM (LLaMA-3 70B)
├── Receives ML weights + type predictions as strict constraints
└── Generates questions + solutions obeying ML decisions
│
▼
Priority Scoring ──► Must / Should / Optional
│
▼
ML Model 4: SM-2 ──► Day-by-day spaced repetition schedule
│
▼
Structured JSON Response ──► Frontend Rendering
Team Execution Plan & Roles
1. Yash: Backend Lead + ML Engine + AI Integration