An intelligent tutoring system powered by Reinforcement Learning (UCB1 Bandit) + Claude AI (LLM) that adapts to each student's weaknesses in real time.
┌─────────────────────────────────────────────────────────┐
│ Streamlit UI (app.py) │
│ Home → Quiz → Results → Dashboard │
└───────────────────────┬─────────────────────────────────┘
│
┌───────────────┼────────────────┐
▼ ▼ ▼
┌───────────────┐ ┌──────────────┐ ┌─────────────────┐
│ UCB1 Agent │ │ LLM Module │ │ Session Manager │
│ rl_agent.py │ │question_gen │ │ session_mgr.py │
│ │ │ .py │ │ │
│ - Selects │ │ - Generates │ │ - SQLite DB │
│ topic+diff │ │ MCQ via │ │ - Tracks answers│
│ - Updates Q │ │ Claude API │ │ - Badges/stats │
│ values │ │ - Hints │ │ - Persistence │
│ - Saves state │ │ - Summaries │ │ │
└───────────────┘ └──────────────┘ └─────────────────┘
│
▼
┌───────────────┐
│ Topics Data │
│ topics.py │
│ 6 AI topics × │
│ 3 difficulties│
│ = 18 arms │
└───────────────┘
This is a Multi-Armed Bandit problem:
- Arms = (topic, difficulty) pairs → 6 topics × 3 difficulties = 18 arms
- Reward = function of correctness, difficulty, and response speed
- Goal = maximize long-term student learning (minimize weak spots)
UCB_score(arm) = weakness(arm) + C × √(ln(N) / n_i)
where:
weakness(arm) = 1 - Q(arm) ← focus on where student struggles
C = 1.5 ← exploration constant
N = total pulls so far
n_i = pulls for this arm
Why UCB1 over ε-greedy?
- UCB1 provides a theoretical upper bound on regret: O(√(K·N·ln N))
- No manual ε tuning needed
- Naturally reduces exploration as estimates converge
raw_reward = (base_reward + speed_bonus) × difficulty_multiplier
base_reward = 1.0 (correct) or 0.0 (wrong)
speed_bonus = 0.3 × max(0, 1 - response_time / 30s) # only if correct
difficulty = 1.0 (easy), 1.5 (medium), 2.0 (hard)
reward = min(raw_reward / 2.6, 1.0) # normalize to [0, 1]Q(arm) += (reward - Q(arm)) / n_pullsadaptive_tutor/
├── app.py # Streamlit UI — main entry point
├── requirements.txt
├── README.md
│
├── agents/
│ ├── __init__.py
│ └── rl_agent.py # UCB1 Bandit agent
│
├── llm/
│ ├── __init__.py
│ └── question_generator.py # Claude API integration
│
├── core/
│ ├── __init__.py
│ └── session_manager.py # SQLite session tracking
│
└── data/
├── __init__.py
├── topics.py # Topic & difficulty definitions
├── agent_state.json # RL agent persistent state (auto-created)
└── tutor.db # SQLite database (auto-created)
git clone <your-repo>
cd adaptive_tutor
pip install -r requirements.txtexport ANTHROPIC_API_KEY="your-api-key-here"Get your key from: https://console.anthropic.com/
streamlit run app.py| Feature | Description |
|---|---|
| Adaptive Topic Selection | UCB1 bandit learns and targets weak areas |
| LLM Question Generation | Claude generates unique MCQs every session |
| Hints on Demand | Non-spoiler hints generated by Claude |
| Session Tracking | SQLite stores all answers, sessions, streaks |
| AI Study Coach | Personalized end-of-session feedback |
| Topic Mastery Map | Visual heatmap of (topic × difficulty) performance |
| Badges & Streaks | Gamification to keep students engaged |
| Focus Mode | Lock the tutor to a specific topic |
| Dashboard | Full analytics — all-time stats, RL arm exploration |
| Topic | Subtopics |
|---|---|
| 🔍 Search Algorithms | BFS, DFS, A*, Dijkstra, heuristics |
| 🤖 Reinforcement Learning | Q-learning, MDPs, rewards, policies |
| 📊 Machine Learning | Supervised, unsupervised, evaluation |
| 🧠 Neural Networks | Backprop, CNNs, RNNs, activation functions |
| 💬 NLP | Tokenization, transformers, embeddings, LLMs |
| 🎲 Probability & Bayesian AI | Bayes theorem, HMMs, inference |
Each topic has Easy / Medium / Hard difficulty = 18 total arms for the RL agent.
| Member | Owns |
|---|---|
| Member 1 | llm/question_generator.py — prompt engineering, hint/summary generation |
| Member 2 | agents/rl_agent.py — UCB1 algorithm, reward function, arm selection |
| Member 3 | app.py + core/session_manager.py — UI, SQLite, badges, metrics |
Edit data/topics.py and add an entry to TOPICS. The RL agent automatically picks it up.
Replace UCBAgent with a full Q-learning agent that treats the quiz as a Markov Decision Process where the state includes performance history.
Change TOPICS to any subject (Math, Biology, History) — the LLM question generator adapts automatically.
MIT — free to use and modify for academic projects.