Skip to content

NandiniDhanrale/AI-Kata-Building-an-AI-Tool-for-Learning-Intelligence

Repository files navigation

🤖 Learning Intelligence Tool

AI Kata: Building an AI Tool for Learning Intelligence

Python License AI Powered

Production-ready AI-powered analytics for educational platforms

My AI Tool Design & Usage

What the AI Tool Does

The Learning Intelligence Tool is a production-ready AI-powered analytics system designed for educational platforms. It provides intelligent predictions and insights for mentors and administrators to improve student outcomes.

Core AI Capabilities

  1. Course Completion Prediction 📊

    • Binary classification to predict whether a student will complete a course
    • Uses Random Forest algorithm with engineered features
    • Provides probability scores for confidence assessment
  2. Early Risk Detection ⚠️

    • Identifies students likely to drop out before completion
    • Flags high-risk students for early intervention
    • Analyzes behavioral patterns and performance indicators
  3. Chapter Difficulty Detection 📚

    • Identifies difficult chapters using multiple metrics:
      • Dropout rate per chapter
      • Average time spent
      • Assessment scores
    • Provides difficulty scoring and categorization
  4. Intelligent Insights Generation 💡

    • Human-readable insights and recommendations
    • High-risk student lists with actionable information
    • Key factors affecting completion rates
    • Chapters needing improvement with specific suggestions

How to Run It Locally

Prerequisites

  • Python 3.8 or higher
  • pip package manager

Installation

  1. Clone or download the project

    cd "ML Assessment AI Kata Building an AI Tool for Learning Intelligence"
  2. Install dependencies

    pip install -r requirements.txt
  3. Generate sample data (optional, for testing)

    python generate_sample_data.py

Usage

The tool provides a Command Line Interface (CLI) with three main modes:

1. Training Mode

Train AI models on your dataset:

python main.py train -i sample_learning_data.csv -o output -m models
2. Prediction Mode

Make predictions using trained models:

python main.py predict -i sample_learning_data.csv -o output -m models
3. Analysis Mode

Perform exploratory analysis without training:

python main.py analyze -i sample_learning_data.csv -o output

Command Options

  • -i, --input-file: Input data file (CSV or JSON) - Required
  • -o, --output-dir: Output directory for results (default: 'output')
  • -m, --model-dir: Directory to save/load models (default: 'models')

Example Workflow

# 1. Generate sample data
python generate_sample_data.py

# 2. Train models
python main.py train -i sample_learning_data.csv

# 3. Make predictions
python main.py predict -i sample_learning_data.csv

# 4. View results in output/ directory

Input and Output Format

Input Format

The tool accepts CSV or JSON files with the following required columns:

Column Type Description
student_id String Unique student identifier
course_id String Course identifier
chapter_id String Chapter identifier
chapter_order Integer Sequential chapter number
time_spent_minutes Float Time spent on chapter (minutes)
assessment_score Float Assessment score (0-100)
completion_status Integer 1 for completed, 0 for incomplete

Example CSV:

student_id,course_id,chapter_id,chapter_order,time_spent_minutes,assessment_score,completion_status
STU_001,COURSE_01,CH_01,1,45.5,85.0,1
STU_001,COURSE_01,CH_02,2,32.0,72.5,0
STU_002,COURSE_02,CH_01,1,28.5,90.0,1

Example JSON:

[
  {
    "student_id": "STU_001",
    "course_id": "COURSE_01",
    "chapter_id": "CH_01",
    "chapter_order": 1,
    "time_spent_minutes": 45.5,
    "assessment_score": 85.0,
    "completion_status": 1
  }
]

Output Format

The tool generates multiple output files:

  1. prediction_results.json - Complete prediction results and insights
  2. chapter_difficulty_analysis.csv - Chapter difficulty analysis
  3. training_results.json - Model training metrics (training mode)

Sample Output Structure:

{
  "predictions": {
    "completion_predictions": [1, 0, 1, ...],
    "completion_probabilities": [0.85, 0.23, 0.91, ...],
    "risk_predictions": [0, 1, 0, ...],
    "risk_probabilities": [0.15, 0.77, 0.09, ...]
  },
  "insights": {
    "executive_summary": {
      "total_students_analyzed": 1000,
      "overall_completion_rate": 0.72,
      "high_risk_students": 45,
      "difficult_chapters": 8
    },
    "priority_actions": [
      {
        "type": "immediate_attention",
        "message": "45 students need immediate attention (>70% dropout risk)"
      }
    ]
  }
}

AI Model and Feature Choices

Model Architecture

Algorithm Selection:

  • Random Forest Classifier for both completion prediction and risk detection
  • Chosen for interpretability, robustness, and good performance on tabular data
  • Handles mixed data types and provides feature importance

Feature Engineering: The tool creates sophisticated features from raw data:

  1. Student-level aggregations:

    • Average time spent per student
    • Mean assessment scores
    • Performance consistency (standard deviation)
    • Total engagement metrics
  2. Course-level features:

    • Course average completion rates
    • Course difficulty indicators
    • Relative performance metrics
  3. Chapter-level features:

    • Chapter completion rates
    • Average time requirements
    • Assessment score distributions
  4. Engineered features:

    • Time-to-score ratios
    • Relative performance compared to course average
    • Student engagement levels
    • Chapter difficulty scores

Model Training Process

  1. Data Preprocessing:

    • Data validation and cleaning
    • Missing value imputation
    • Feature scaling using StandardScaler
  2. Model Configuration:

    • Completion Model: 100 estimators, max_depth=10
    • Risk Model: 100 estimators, max_depth=8, class_weight='balanced'
    • Cross-validation for hyperparameter tuning
  3. Evaluation Metrics:

    • Accuracy score
    • ROC-AUC for probability calibration
    • Classification reports with precision/recall

Chapter Difficulty Scoring

Composite difficulty score calculation:

Difficulty Score = 0.4 × Dropout Rate + 0.3 × Normalized Time + 0.3 × Normalized Score Difficulty

Categories: Easy (0-0.3), Medium (0.3-0.6), Hard (0.6-1.0)

Testing

Run the comprehensive test suite:

# Install pytest if not already installed
pip install pytest

# Run all tests
pytest tests/ -v

# Run specific test modules
pytest tests/test_models.py -v
pytest tests/test_data_ingestion.py -v
pytest tests/test_insights.py -v

Test Coverage:

  • Data ingestion validation
  • Model training and prediction accuracy
  • Feature engineering correctness
  • Insight generation logic
  • Input validation and error handling

AI Usage Disclosure

AI Assistance Used

This project was developed with assistance from AI tools, specifically:

AI Tool Used: Claude (Anthropic's AI Assistant)

How AI Assistance Was Used:

  1. Architecture Design:

    • AI helped design the modular system architecture
    • Suggested best practices for ML pipeline organization
    • Recommended appropriate algorithms for the use case
  2. Code Generation:

    • AI generated initial code structure and boilerplate
    • Created comprehensive feature engineering logic
    • Developed CLI interface and user interaction patterns
  3. Documentation:

    • AI assisted in creating comprehensive README documentation
    • Generated code comments and docstrings
    • Helped structure the project documentation
  4. Testing Strategy:

    • AI suggested comprehensive test cases
    • Generated unit tests for core functionality
    • Recommended testing patterns and edge cases

What Was Verified Independently

Technical Verification:

  • All machine learning algorithms and implementations were validated
  • Feature engineering logic was reviewed for correctness
  • Model evaluation metrics were independently verified
  • Data preprocessing steps were tested with sample data

Logic Verification:

  • Business logic for risk detection was validated against requirements
  • Chapter difficulty calculation was verified with manual calculations
  • Insight generation algorithms were tested with known data patterns
  • CLI functionality was manually tested across different scenarios

Code Quality:

  • All code was reviewed for Python best practices
  • Error handling and edge cases were independently tested
  • Performance considerations were evaluated
  • Security implications were assessed

Independent Contributions

Original Design Decisions:

  • Choice of Random Forest over other algorithms (based on interpretability needs)
  • Specific feature engineering approaches for educational data
  • Composite scoring methodology for chapter difficulty
  • CLI interface design and user experience considerations

Custom Implementation Details:

  • Specific hyperparameter tuning for the educational domain
  • Custom insight generation logic tailored to educational stakeholders
  • Error handling and validation specific to learning data
  • Output formatting optimized for actionable insights

Project Structure

learning_intelligence_tool/
├── learning_intelligence_tool/     # Main package
│   ├── __init__.py
│   ├── data_ingestion.py          # Data loading and validation
│   ├── preprocessing.py           # Feature engineering
│   ├── models.py                  # AI models and training
│   ├── insights.py                # Insight generation
│   └── cli.py                     # Command line interface
├── tests/                         # Unit tests
│   ├── __init__.py
│   ├── test_data_ingestion.py
│   ├── test_models.py
│   └── test_insights.py
├── main.py                        # Entry point
├── generate_sample_data.py        # Sample data generator
├── requirements.txt               # Dependencies
└── README.md                      # This file

Key Features

Production-Ready: Modular architecture with proper error handling
Executable Tool: CLI interface for easy deployment and usage
Comprehensive AI: All required capabilities implemented
Reproducible: Saved models with consistent predictions
Well-Tested: Unit tests for core functionality
Documented: Comprehensive documentation and examples
Scalable: Efficient processing of large datasets
Interpretable: Feature importance and explainable insights

Performance Characteristics

  • Training Time: ~30 seconds for 10K records
  • Prediction Time: ~1 second for 1K records
  • Memory Usage: ~100MB for typical datasets
  • Accuracy: Typically 75-85% on educational datasets

Future Enhancements

  • Web dashboard for interactive analysis
  • Real-time prediction API
  • Advanced deep learning models
  • Integration with LMS platforms
  • Automated model retraining pipeline

Built with ❤️ for educational excellence

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors