Automated homework analysis using LLM-as-judge to evaluate student performance and generate personalized practice problems.
This pipeline analyzes student homework conversations and produces:
- Quantitative metrics: Total attempted, solved, and errors
- Qualitative metrics: Topic proficiency (Mastered vs Needs Practice)
- Practice problems: 4-5 new problems for weak topics
- Python 3.7+
requestslibrary (for Portkey API calls)
student_analysis_pipeline/
├── main.py # Pipeline orchestrator (run this)
├── pipeline.py # Core pipeline logic (Steps 1-4)
├── data_loader.py # Input file parsers
├── utils.py # Portkey API client with retry logic
└── analysis_results.json # Output file (generated)
cd /Users/prashant/Desktop/development/Data_pipeline/student_analysis_pipeline
python3 main.py-
Loads data from:
hw4/hw4_question.mdhw4/hw4_reference_solution.mdhw4/student_conversations/ab12167_hw4_chats.json
-
Runs 4-step pipeline:
- Step 1: Topic Mapping (LLM)
- Step 2: Per-Question Evaluation (LLM)
- Step 3: Aggregation (Python)
- Step 4: Practice Problem Generation (LLM)
-
Outputs results to:
analysis_results.json
- ~30-60 seconds (depends on API response time)
- 15 questions × ~2 seconds each for evaluation
- Plus topic mapping and practice generation
{
"metrics": {
"quantitative": {
"total_questions": 13,
"total_attempted": 12,
"total_solved": 11,
"total_errors": 1
},
"qualitative": {
"mastered_topics": [
{
"topic": "Power Rule",
"evidence": {
"questions_tested": [2, 3, 5],
"performance": "3/3 solved",
"details": ["Q2: Solved", "Q3: Solved", "Q5: Solved"],
"reason": "All questions attempted and solved correctly"
}
}
],
"needs_practice_topics": [...]
}
},
"practice_problems": [...],
"details": {...}
}- Automatic retries: 3 attempts per LLM call with exponential backoff
- Failure mode: Pipeline crashes with exception if API fails after retries
- No silent failures: All errors are raised explicitly
To analyze a different student or homework:
Edit paths in main.py:
QUESTIONS_PATH = "hw4/hw4_question.md"
SOLUTIONS_PATH = "hw4/hw4_reference_solution.md"
CHAT_PATH = "hw4/student_conversations/ab12167_hw4_chats.json"The pipeline uses NYU's Portkey gateway:
- Base URL:
https://ai-gateway.apps.cloud.rt.nyu.edu/v1 - Model: GPT-4o (
@gpt-4o/gpt-4o) - Credentials are in
utils.py
IMPORTANT: This pipeline is currently configured for Calculus I assignments.
To use it for other subjects (Algebra, Statistics, Physics, etc.), you MUST update the following in pipeline.py:
1. Step 1: Topic Mapping (Line 16)
# Current (Calculus-specific):
system_prompt = """You are an expert Calculus tutor. Your task is to identify the calculus concepts each question tests."""
# Change to (Generic):
system_prompt = """You are an expert [SUBJECT] tutor. Your task is to identify the [SUBJECT] concepts each question tests."""
# Examples:
# - "You are an expert Algebra tutor. Your task is to identify the algebra concepts..."
# - "You are an expert Statistics tutor. Your task is to identify the statistics concepts..."
# - "You are an expert Physics tutor. Your task is to identify the physics concepts..."2. Step 2: Per-Question Evaluation (Line 49)
# Current (Calculus-specific):
system_prompt = """You are an expert Calculus tutor acting as a judge."""
# Change to (Generic):
system_prompt = """You are an expert [SUBJECT] tutor acting as a judge."""3. Step 4: Practice Problem Generation (Line 238)
# Current (Calculus-specific):
system_prompt = """You are an expert Calculus tutor. Generate practice problems..."""
# Change to (Generic):
system_prompt = """You are an expert [SUBJECT] tutor. Generate practice problems..."""For Algebra:
- Replace "Calculus tutor" with "Algebra tutor"
- Replace "calculus concepts" with "algebra concepts"
For Statistics:
- Replace "Calculus tutor" with "Statistics tutor"
- Replace "calculus concepts" with "statistics concepts"
For Physics:
- Replace "Calculus tutor" with "Physics tutor"
- Replace "calculus concepts" with "physics concepts"
For Chemistry:
- Replace "Calculus tutor" with "Chemistry tutor"
- Replace "calculus concepts" with "chemistry concepts"
Note: The rest of the pipeline (evaluation criteria, metrics, error types) remains the same across subjects.
This project includes several documentation files to help you understand and use the pipeline:
Core Documentation:
README.md(this file) - Main documentation and usage guidemetrics_definitions.md- Detailed definitions of all metrics and evaluation criteria
Technical Documentation:
HOW_LLM_WORKS.md- Explanation of how LLM evaluation works and why we send full conversationLLM_CALLS_ANALYSIS.md- Breakdown of all 17 LLM calls, input data, and efficiency analysis
Input Files:
hw4_question.md- Homework questionshw4_reference_solution.md- Reference solutionsab12167_hw4_conversation.md- Student conversation with AI tutor
Code Files:
main.py- Pipeline orchestrator (run this)pipeline.py- Core pipeline logic (Steps 1-4)data_loader.py- Input file parsersutils.py- Portkey API client with retry logicexport_conversation.py- Script to convert JSON chats to markdown
Output:
analysis_results.json- Generated analysis report
Start by reading metrics_definitions.md to understand what the pipeline measures, then read HOW_LLM_WORKS.md to understand the implementation.