An intelligent ML-powered system that automatically classifies and prioritizes student questions in AI/ML education contexts. This project addresses the real challenge of managing hundreds of student questions efficiently by categorizing them into technical domains and urgency levels.
In AI education settings, instructors receive numerous questions across various topics (Python basics, ML algorithms, debugging, conceptual understanding). Manually triaging these questions is time-consuming and inconsistent. This system uses NLP and machine learning to:
- Classify questions by topic (Python/Programming, Machine Learning, Deep Learning, Data Processing, Conceptual/Theory)
- Assess urgency level (Critical/Blocking, High Priority, Normal, Low Priority)
- Route to appropriate resources or instructors based on classification
- Synthetic dataset generation based on real educational patterns
- Data includes: question text, category labels, urgency levels
- Train/test split with stratification to maintain class balance
- Text preprocessing: lowercasing, tokenization, handling code snippets
- TF-IDF vectorization for text representation
- Tuned parameters: max_features=5000, ngram_range=(1,2)
- Captures both single words and bigrams for better context
- Handles code-specific terminology and technical vocabulary
- Primary Model: Logistic Regression with L2 regularization
- Alternative explored: Random Forest for comparison
- Multi-class classification with balanced class weights
- Hyperparameter tuning via grid search
- Classification metrics: Precision, Recall, F1-score
- Confusion matrix analysis to identify misclassification patterns
- Cross-validation to ensure generalization
- Performance analysis across different question types
precision recall f1-score support
Conceptual/Theory 1.00 1.00 1.00 40
Data Processing 1.00 1.00 1.00 40
Deep Learning 1.00 1.00 1.00 40
Machine Learning 1.00 1.00 1.00 40
Python/Programming 1.00 1.00 1.00 40
accuracy 1.00 200
macro avg 1.00 1.00 1.00 200
weighted avg 1.00 1.00 1.00 200
Analysis: Perfect classification on test set with 79.8% mean confidence. The model strongly leverages technical keywords to distinguish question categories.
precision recall f1-score support
Critical 0.47 0.50 0.48 28
High 0.59 0.58 0.59 50
Low 0.34 0.67 0.45 18
Normal 0.77 0.63 0.69 104
accuracy 0.60 200
macro avg 0.54 0.60 0.55 200
weighted avg 0.64 0.60 0.62 200
Analysis: Lower accuracy (60%) reflects the inherent subjectivity of urgency assessment. Main confusion occurs between Normal and High priority questions, which aligns with real-world ambiguity.
pip install -r requirements.txtpython train.pypython predict.py "How do I fix this AttributeError in my neural network?"python evaluate.pystudent-question-classifier/
βββ README.md # Project documentation
βββ requirements.txt # Python dependencies
βββ data/
β βββ generate_data.py # Synthetic data generation
β βββ questions.csv # Generated training data
βββ src/
β βββ preprocessing.py # Text preprocessing utilities
β βββ feature_engineering.py # TF-IDF and feature extraction
β βββ models.py # Model definitions and training
βββ train.py # Main training script
βββ evaluate.py # Model evaluation script
βββ predict.py # Inference script
βββ models/ # Saved model artifacts
β βββ category_classifier.pkl
β βββ urgency_classifier.pkl
β βββ vectorizer.pkl
βββ notebooks/
βββ exploratory_analysis.ipynb # Data exploration and visualization
Problem: Not all question categories appear equally frequently in real educational settings. Solution: Applied class weighting in the model to ensure minority classes receive appropriate attention during training.
Problem: Questions containing code snippets have different linguistic patterns than natural language. Solution: Preserved code structure in preprocessing while still extracting semantic meaning through character n-grams.
Problem: Some questions span multiple categories (e.g., "How do I implement gradient descent in Python?") Solution: Built separate models for category and urgency to allow independent classification. Future work could explore multi-label classification.
Problem: Urgency is contextual and subjective compared to topic classification. Solution: Trained on patterns like "not working", "error", "urgent", "deadline" combined with question sentiment analysis.
- Deep Learning Approach: Implement BERT-based classification for better semantic understanding
- Active Learning: Incorporate instructor feedback to continuously improve classification
- Multi-label Support: Allow questions to belong to multiple categories simultaneously
- Confidence Scores: Add probability outputs to flag uncertain classifications for manual review
- Real-time API: Deploy as a REST API for integration with learning management systems
- Expanded Features: Include student history, previous questions, and course progress context
This project emerged from real challenges in teaching AI/ML courses where:
- 50-100+ students generate 200+ questions per course
- Questions range from basic Python syntax to advanced ML theory
- Response time directly impacts student learning and retention
- Instructors need to prioritize high-impact interventions
The classification system enables:
- Automated routing to teaching assistants based on expertise
- Priority queuing for critical blocking issues
- Self-service recommendations by matching to FAQ/documentation
- Analytics on common confusion points to improve curriculum
This is a learning project, but suggestions and improvements are welcome:
- Fork the repository
- Create a feature branch
- Make your changes with clear commit messages
- Submit a pull request with description
MIT License - feel free to use this for educational purposes.
Christopher Lee
- Product Management Consultant & AI Educator
- Teaching AI/ML Mastery classes in Queens, NY
- GitHub: @pmchrislee
- Email: [email protected]
- Built as part of learning journey in practical ML engineering
- Inspired by real challenges in AI education delivery
- Thanks to the open-source ML community for excellent tools and resources