🧠 Adaptive AI Study Tutor

An intelligent tutoring system powered by Reinforcement Learning (UCB1 Bandit) + Claude AI (LLM) that adapts to each student's weaknesses in real time.

📐 Architecture Overview

┌─────────────────────────────────────────────────────────┐
│                    Streamlit UI (app.py)                │
│   Home → Quiz → Results → Dashboard                    │
└───────────────────────┬─────────────────────────────────┘
                        │
        ┌───────────────┼────────────────┐
        ▼               ▼                ▼
┌───────────────┐ ┌──────────────┐ ┌─────────────────┐
│  UCB1 Agent   │ │  LLM Module  │ │ Session Manager │
│ rl_agent.py   │ │question_gen  │ │ session_mgr.py  │
│               │ │    .py       │ │                 │
│ - Selects     │ │ - Generates  │ │ - SQLite DB     │
│   topic+diff  │ │   MCQ via    │ │ - Tracks answers│
│ - Updates Q   │ │   Claude API │ │ - Badges/stats  │
│   values      │ │ - Hints      │ │ - Persistence   │
│ - Saves state │ │ - Summaries  │ │                 │
└───────────────┘ └──────────────┘ └─────────────────┘
        │
        ▼
┌───────────────┐
│  Topics Data  │
│  topics.py    │
│ 6 AI topics × │
│ 3 difficulties│
│ = 18 arms     │
└───────────────┘

🤖 Reinforcement Learning: UCB1 Bandit

Problem Framing

This is a Multi-Armed Bandit problem:

Arms = (topic, difficulty) pairs → 6 topics × 3 difficulties = 18 arms
Reward = function of correctness, difficulty, and response speed
Goal = maximize long-term student learning (minimize weak spots)

UCB1 Algorithm

UCB_score(arm) = weakness(arm) + C × √(ln(N) / n_i)

where:
  weakness(arm) = 1 - Q(arm)     ← focus on where student struggles
  C = 1.5                         ← exploration constant
  N = total pulls so far
  n_i = pulls for this arm

Why UCB1 over ε-greedy?

UCB1 provides a theoretical upper bound on regret: O(√(K·N·ln N))
No manual ε tuning needed
Naturally reduces exploration as estimates converge

Reward Function

raw_reward = (base_reward + speed_bonus) × difficulty_multiplier

base_reward   = 1.0 (correct) or 0.0 (wrong)
speed_bonus   = 0.3 × max(0, 1 - response_time / 30s)   # only if correct
difficulty    = 1.0 (easy), 1.5 (medium), 2.0 (hard)

reward = min(raw_reward / 2.6, 1.0)   # normalize to [0, 1]

Q-Value Update (Incremental Mean)

Q(arm) += (reward - Q(arm)) / n_pulls

🗂️ Project Structure

adaptive_tutor/
├── app.py                        # Streamlit UI — main entry point
├── requirements.txt
├── README.md
│
├── agents/
│   ├── __init__.py
│   └── rl_agent.py               # UCB1 Bandit agent
│
├── llm/
│   ├── __init__.py
│   └── question_generator.py     # Claude API integration
│
├── core/
│   ├── __init__.py
│   └── session_manager.py        # SQLite session tracking
│
└── data/
    ├── __init__.py
    ├── topics.py                  # Topic & difficulty definitions
    ├── agent_state.json           # RL agent persistent state (auto-created)
    └── tutor.db                   # SQLite database (auto-created)

🚀 Setup & Run

1. Clone & Install

git clone <your-repo>
cd adaptive_tutor
pip install -r requirements.txt

2. Set Anthropic API Key

export ANTHROPIC_API_KEY="your-api-key-here"

Get your key from: https://console.anthropic.com/

3. Run the App

streamlit run app.py

🎯 Features

Feature	Description
Adaptive Topic Selection	UCB1 bandit learns and targets weak areas
LLM Question Generation	Claude generates unique MCQs every session
Hints on Demand	Non-spoiler hints generated by Claude
Session Tracking	SQLite stores all answers, sessions, streaks
AI Study Coach	Personalized end-of-session feedback
Topic Mastery Map	Visual heatmap of (topic × difficulty) performance
Badges & Streaks	Gamification to keep students engaged
Focus Mode	Lock the tutor to a specific topic
Dashboard	Full analytics — all-time stats, RL arm exploration

📊 Topics Covered

Topic	Subtopics
🔍 Search Algorithms	BFS, DFS, A*, Dijkstra, heuristics
🤖 Reinforcement Learning	Q-learning, MDPs, rewards, policies
📊 Machine Learning	Supervised, unsupervised, evaluation
🧠 Neural Networks	Backprop, CNNs, RNNs, activation functions
💬 NLP	Tokenization, transformers, embeddings, LLMs
🎲 Probability & Bayesian AI	Bayes theorem, HMMs, inference

Each topic has Easy / Medium / Hard difficulty = 18 total arms for the RL agent.

👥 Team Contribution Guide

Member	Owns
Member 1	`llm/question_generator.py` — prompt engineering, hint/summary generation
Member 2	`agents/rl_agent.py` — UCB1 algorithm, reward function, arm selection
Member 3	`app.py` + `core/session_manager.py` — UI, SQLite, badges, metrics

🔬 Extending the Project

Add a new topic

Edit data/topics.py and add an entry to TOPICS. The RL agent automatically picks it up.

Swap to Q-Learning (tabular)

Replace UCBAgent with a full Q-learning agent that treats the quiz as a Markov Decision Process where the state includes performance history.

Custom subject

Change TOPICS to any subject (Math, Biology, History) — the LLM question generator adapts automatically.

📝 License

MIT — free to use and modify for academic projects.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🧠 Adaptive AI Study Tutor

📐 Architecture Overview

🤖 Reinforcement Learning: UCB1 Bandit

Problem Framing

UCB1 Algorithm

Reward Function

Q-Value Update (Incremental Mean)

🗂️ Project Structure

🚀 Setup & Run

1. Clone & Install

2. Set Anthropic API Key

3. Run the App

🎯 Features

📊 Topics Covered

👥 Team Contribution Guide

🔬 Extending the Project

Add a new topic

Swap to Q-Learning (tabular)

Custom subject

📝 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
agents		agents
core		core
data		data
llm		llm
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

🧠 Adaptive AI Study Tutor

📐 Architecture Overview

🤖 Reinforcement Learning: UCB1 Bandit

Problem Framing

UCB1 Algorithm

Reward Function

Q-Value Update (Incremental Mean)

🗂️ Project Structure

🚀 Setup & Run

1. Clone & Install

2. Set Anthropic API Key

3. Run the App

🎯 Features

📊 Topics Covered

👥 Team Contribution Guide

🔬 Extending the Project

Add a new topic

Swap to Q-Learning (tabular)

Custom subject

📝 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages