Skip to content

kanish5/adaptive-ai-study-tutor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🧠 Adaptive AI Study Tutor

An intelligent tutoring system powered by Reinforcement Learning (UCB1 Bandit) + Claude AI (LLM) that adapts to each student's weaknesses in real time.


📐 Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    Streamlit UI (app.py)                │
│   Home → Quiz → Results → Dashboard                    │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼────────────────┐
        ▼               ▼                ▼
┌───────────────┐ ┌──────────────┐ ┌─────────────────┐
│  UCB1 Agent   │ │  LLM Module  │ │ Session Manager │
│ rl_agent.py   │ │question_gen  │ │ session_mgr.py  │
│               │ │    .py       │ │                 │
│ - Selects     │ │ - Generates  │ │ - SQLite DB     │
│   topic+diff  │ │   MCQ via    │ │ - Tracks answers│
│ - Updates Q   │ │   Claude API │ │ - Badges/stats  │
│   values      │ │ - Hints      │ │ - Persistence   │
│ - Saves state │ │ - Summaries  │ │                 │
└───────────────┘ └──────────────┘ └─────────────────┘
        │
        ▼
┌───────────────┐
│  Topics Data  │
│  topics.py    │
│ 6 AI topics × │
│ 3 difficulties│
│ = 18 arms     │
└───────────────┘

🤖 Reinforcement Learning: UCB1 Bandit

Problem Framing

This is a Multi-Armed Bandit problem:

  • Arms = (topic, difficulty) pairs → 6 topics × 3 difficulties = 18 arms
  • Reward = function of correctness, difficulty, and response speed
  • Goal = maximize long-term student learning (minimize weak spots)

UCB1 Algorithm

UCB_score(arm) = weakness(arm) + C × √(ln(N) / n_i)

where:
  weakness(arm) = 1 - Q(arm)     ← focus on where student struggles
  C = 1.5                         ← exploration constant
  N = total pulls so far
  n_i = pulls for this arm

Why UCB1 over ε-greedy?

  • UCB1 provides a theoretical upper bound on regret: O(√(K·N·ln N))
  • No manual ε tuning needed
  • Naturally reduces exploration as estimates converge

Reward Function

raw_reward = (base_reward + speed_bonus) × difficulty_multiplier

base_reward   = 1.0 (correct) or 0.0 (wrong)
speed_bonus   = 0.3 × max(0, 1 - response_time / 30s)   # only if correct
difficulty    = 1.0 (easy), 1.5 (medium), 2.0 (hard)

reward = min(raw_reward / 2.6, 1.0)   # normalize to [0, 1]

Q-Value Update (Incremental Mean)

Q(arm) += (reward - Q(arm)) / n_pulls

🗂️ Project Structure

adaptive_tutor/
├── app.py                        # Streamlit UI — main entry point
├── requirements.txt
├── README.md
│
├── agents/
│   ├── __init__.py
│   └── rl_agent.py               # UCB1 Bandit agent
│
├── llm/
│   ├── __init__.py
│   └── question_generator.py     # Claude API integration
│
├── core/
│   ├── __init__.py
│   └── session_manager.py        # SQLite session tracking
│
└── data/
    ├── __init__.py
    ├── topics.py                  # Topic & difficulty definitions
    ├── agent_state.json           # RL agent persistent state (auto-created)
    └── tutor.db                   # SQLite database (auto-created)

🚀 Setup & Run

1. Clone & Install

git clone <your-repo>
cd adaptive_tutor
pip install -r requirements.txt

2. Set Anthropic API Key

export ANTHROPIC_API_KEY="your-api-key-here"

Get your key from: https://console.anthropic.com/

3. Run the App

streamlit run app.py

🎯 Features

Feature Description
Adaptive Topic Selection UCB1 bandit learns and targets weak areas
LLM Question Generation Claude generates unique MCQs every session
Hints on Demand Non-spoiler hints generated by Claude
Session Tracking SQLite stores all answers, sessions, streaks
AI Study Coach Personalized end-of-session feedback
Topic Mastery Map Visual heatmap of (topic × difficulty) performance
Badges & Streaks Gamification to keep students engaged
Focus Mode Lock the tutor to a specific topic
Dashboard Full analytics — all-time stats, RL arm exploration

📊 Topics Covered

Topic Subtopics
🔍 Search Algorithms BFS, DFS, A*, Dijkstra, heuristics
🤖 Reinforcement Learning Q-learning, MDPs, rewards, policies
📊 Machine Learning Supervised, unsupervised, evaluation
🧠 Neural Networks Backprop, CNNs, RNNs, activation functions
💬 NLP Tokenization, transformers, embeddings, LLMs
🎲 Probability & Bayesian AI Bayes theorem, HMMs, inference

Each topic has Easy / Medium / Hard difficulty = 18 total arms for the RL agent.


👥 Team Contribution Guide

Member Owns
Member 1 llm/question_generator.py — prompt engineering, hint/summary generation
Member 2 agents/rl_agent.py — UCB1 algorithm, reward function, arm selection
Member 3 app.py + core/session_manager.py — UI, SQLite, badges, metrics

🔬 Extending the Project

Add a new topic

Edit data/topics.py and add an entry to TOPICS. The RL agent automatically picks it up.

Swap to Q-Learning (tabular)

Replace UCBAgent with a full Q-learning agent that treats the quiz as a Markov Decision Process where the state includes performance history.

Custom subject

Change TOPICS to any subject (Math, Biology, History) — the LLM question generator adapts automatically.


📝 License

MIT — free to use and modify for academic projects.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages