This application analyzes educational text passages to identify optimal intervention points by mapping content to specific academic skills and providing targeted discussion questions for follow-up learning. It uses large language models (LLMs) to intelligently detect, rate, and explain skill alignment—helping educators personalize instruction and improve learning outcomes.
See full output here: output/combined_data_final.xlsx
- Uses a comprehensive taxonomy of 69 educational competencies from
input/skills.csv
- Skills span a wide range of domains:
- Science (e.g., life sciences, physics, earth science)
- Social Studies (e.g., history, geography, civics)
- Language Arts
- Mathematics
- Arts & Physical Education
- Digital Literacy
- Powered by Groq LLM API (llama3-70b-8192) through
llm_service.py
- Uses structured prompt templates to ensure consistency
- Low temperature (0.01) for deterministic, repeatable outputs
Each model response includes:
- Identified skill tag(s)
- Alignment rating (scale of 0–10)
- Pedagogical explanation
- Highlighted text excerpt supporting the alignment
The system pinpoints passages that:
- Strongly align with specific skills (ratings: 9–10)
- Show partial alignment or emerging understanding (ratings: 5–6)
- Offer opportunities for teacher-led discussion or review
- Map multiple skills to the same passage when relevant
The system generates targeted discussion points that:
- Reinforce key concepts through guided questioning
- Connect skills across different subject areas
- Promote critical thinking with open-ended prompts
- Support differentiated instruction with varying difficulty levels
- Python-based processing pipeline
- Structured prompt engineering - JSON Output
- Various Pompt Techniques (Few-Shot Learning, Tooling, Chaining)
- LLM output stored and analyzed using DataFrames
- Embedding-based dataset joins to reduce hallucinations
- Final output: Excel reports for easy review & collaboration
-
Sign up at Groq and obtain a free API key.
-
Set up your environment variable:
export GROQ_API_KEY='your-api-key-here'
-
Set up the virtual environment:
# Create a new virtual environment python -m venv skill-venv # Activate the virtual environment source skill-venv/bin/activate # On macOS/Linux # or .\skill-venv\Scripts\activate # On Windows # Install dependencies pip install -r requirements.txt
-
Run the skill alignment script:
python run_01_align_skills_to_stories.py
This will process the stories from
input/stories.csv
and generate skill alignments. -
Combine the data:
python run_02_combine_data.py
This will generate the final combined output in
output/combined_data_final.xlsx
. -
(Optional) Generate discussion questions:
python run_03_generate_discussion_questions.py
This will create additional discussion questions in
output/discussion_questions.xlsx
.
The final outputs will be available in the output/
directory:
- combined_data_final.xlsx: Main output with skill alignments
- discussion_questions.xlsx: Secondary output with associated questions for skill discussion
The llm_service.py
file provides a robust implementation for processing educational content using the Groq LLM API. Here's a detailed breakdown of its functionality:
-
Key Prompt Components:
- Skills Augmented Analysis: Analyzes text passages to identify and rate educational skills
- Discussion Question Generation: Creates targeted questions based on identified skills
- Few-Shot Learning: Uses example-based prompting from
examples/few_shot_examples_discussion_questions.json
- Custom Tooling: Supports GPT-4 function calling for structured outputs
-
Output Structure & Sample Output:
- Skills Analysis Output Structure:
Sample Output:
{ "skills": [ { "skill": "skill description", "explanation": "why it is aligned", "story_excerpt": "where in the story to stop to review this skill", "rating": 0-10 } ] }
{ "skills": [ { "skill": "Knows about transportation", "explanation": "The story mentions going in a car and on a train, showing an understanding of different modes of transportation.", "story_excerpt": "Some days, Dad and I go in the car. Dad drives. I ride. Some days, Dad and I go on the train.", "rating": 10 } ] }
- Discussion Questions Output Structure:
Sample Output:
[ { "question": "question text", "type": "Recall/Comprehension/Application", "instructional_purpose": "purpose of the question" } ]
{ "questions": [ { "question": "What are two ways the family travels?", "type": "Recall", "instructional_purpose": "Assess whether the student can recall the modes of transportation mentioned in the story." }, { "question": "Why did the family choose to take the train for their vacation?", "type": "Comprehension", "instructional_purpose": "Assess whether the student understands the reason behind the family's transportation choice." }, { "question": "What other ways can people travel besides cars and trains?", "type": "Application", "instructional_purpose": "Requires the student to think about other modes of transportation beyond what was mentioned in the story." } ] }
- Skills Analysis Output Structure:
-
Quality Control:
- Prompt Templates: Implements structured prompt templates
- Validation: Uses JSON schema validation
- Error Handling: Includes comprehensive error handling and retry mechanisms
- Debugging: Supports debugging through message printing
-
Sample Usage:
python llm_service.py
To see a sample output for one story, checkout the output/sample_prompt_chain.txt
file, which demonstrates the full processing pipeline from story analysis to question generation.
- Randomly sample and review LLM-generated outputs.
- Human raters evaluate skill alignment, clarity, and pedagogical value.
- Compare human and model ratings to better engineer prompts.
- Identify skill categories or content formats where the model underperforms.
- Use a separate model with a prompt that mimics human evaluation behavior to assess content.
- Helps reduce reliance on manual reviews for future outputs.
- Run controlled comparisons of LLM-generated interventions.
- Use engagement or comprehension metrics to assess effectiveness.
- Improve LLM output validation and error handling
- Implement a scalable LLM-as-a-Judge system for reviews
- Add another prompt for skills assessment
- Add dynamic text highlighting based on skill strength
- Integrate student engagement metrics for optimization
- Visualize and track skill dependencies across stories