This project leverages a BERT-based deep learning model to classify text articles as either AI-generated or human-written. Using PyTorch and the Hugging Face transformers library, the project implements fine-tuning of a pre-trained BERT model for binary classification.
The primary goal of this project is to classify text data into two categories:
- AI-generated
 - Human-written
 
The workflow includes:
- Tokenizing text data using a BERT tokenizer.
 - Defining a PyTorch dataset and data loader for text and labels.
 - Building and training a custom BERT-based classifier.
 - Evaluating the model using stratified cross-validation.
 - Saving the trained model for deployment or further analysis.
 
- Pre-trained BERT Model: Fine-tunes 
bert-base-uncasedfor text classification. - Custom Dataset Class: Implements a PyTorch-compatible dataset class for efficient data handling.
 - Cross-Validation: Uses Stratified K-Fold cross-validation to ensure robust evaluation.
 - Evaluation Metrics: Calculates accuracy, F1 score, precision, and recall.
 
- Python 3.7+
 - Libraries:
torchtransformerspandasnumpysklearn
 
- Experiment with advanced pre-trained models like RoBERTa or DeBERTa.
 - Handle class imbalance with techniques like oversampling or weighted loss functions.
 - Extend the model to support multiclass classification for other types of text.