This project focuses on analyzing and classifying student essays using NLP and machine learning techniques, combined with generative AI for conclusion generation. The workflow includes data preprocessing, feature extraction, normalization, classification using multiple models, and text generation using BART (Bidirectional and Auto-Regressive Transformers).
- Data Exploration & Preprocessing: Label encoding, text cleaning, z-normalization.
- Feature Extraction: TF-IDF representation of essays.
- Classification Models: Logistic Regression, KNN, SVM, Random Forest, XGBoost, and BERT-based sentence classification.
- Model Optimization: Hyperparameter tuning to improve classification performance.
- Generative AI: BART used to generate textual conclusions for essays.
- Evaluation Metrics: ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore (F1, Recall, Precision) to compare generated conclusions with original content.
- Load and explore student essay dataset.
- Preprocess text data and encode labels.
- Extract features using TF-IDF and apply z-normalization.
- Split data into training (80%) and testing (20%) sets.
- Train and evaluate multiple classification models.
- Fine-tune the best-performing model (XGBoost) for improved results.
- Generate essay conclusions using BART and evaluate with ROUGE & BERTScore.
- Comprehensive insights from student essays.
- Comparison of ML models for text classification.
- Generated conclusions evaluated against original text for quality assessment.
- Python, Pandas, NumPy, Scikit-learn
- Transformers (Hugging Face)
- BART for text generation
- Evaluation: ROUGE & BERTScore