Skip to content

gamzeakkurt/BART-Insights

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Text Analysis & Generative AI: Bart Insights

Overview

This project focuses on analyzing and classifying student essays using NLP and machine learning techniques, combined with generative AI for conclusion generation. The workflow includes data preprocessing, feature extraction, normalization, classification using multiple models, and text generation using BART (Bidirectional and Auto-Regressive Transformers).

Features

  • Data Exploration & Preprocessing: Label encoding, text cleaning, z-normalization.
  • Feature Extraction: TF-IDF representation of essays.
  • Classification Models: Logistic Regression, KNN, SVM, Random Forest, XGBoost, and BERT-based sentence classification.
  • Model Optimization: Hyperparameter tuning to improve classification performance.
  • Generative AI: BART used to generate textual conclusions for essays.
  • Evaluation Metrics: ROUGE-1, ROUGE-2, ROUGE-L, and BERTScore (F1, Recall, Precision) to compare generated conclusions with original content.

Workflow

  1. Load and explore student essay dataset.
  2. Preprocess text data and encode labels.
  3. Extract features using TF-IDF and apply z-normalization.
  4. Split data into training (80%) and testing (20%) sets.
  5. Train and evaluate multiple classification models.
  6. Fine-tune the best-performing model (XGBoost) for improved results.
  7. Generate essay conclusions using BART and evaluate with ROUGE & BERTScore.

Outcome

  • Comprehensive insights from student essays.
  • Comparison of ML models for text classification.
  • Generated conclusions evaluated against original text for quality assessment.

Technologies

  • Python, Pandas, NumPy, Scikit-learn
  • Transformers (Hugging Face)
  • BART for text generation
  • Evaluation: ROUGE & BERTScore

About

A project for student essay analysis using NLP, ML, and generative AI. Essays are classified with models like Logistic Regression, KNN, SVM, XGBoost, and BERT, and conclusions are generated using BART and evaluated with ROUGE & BERTScore.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors