This project fine-tunes a pre-trained transformer model, GPT-2, to classify tweet sentiments into three categories: Negative, Neutral, and Positive. The model was trained and evaluated on a labeled dataset of tweets, with the entire workflow executed in Google Colab using a Tesla T4 GPU to accelerate training and inference. The goal is to create a lightweight, accurate sentiment classifier that can be used to analyze social media content in real time.
gpt2-tweet-sentiment/
│
├── app/ # Local deployment
│ ├── main.py/ # FastAPI backend for inference
│ └── frontend.py/ # Gradio UI
|
├── data/ # Tweet dataset
│ ├── raw/ # Raw tweet data
│ └── processed/ # Cleaned data
│
├── figures/ # Visualizations
│ └── gpt2-model-confusion-matrix.png # Model confusion matrix
│
├── models/ # Trained GPT-2 models
│ └── gpt2-final-model/ # Final saved model and tokenizer
│
├── notebooks/ # Notebooks
│ └── gpt2-finetune-tweet-sentiment.ipynb # End-to-end pipeline
│
├── results/ # Model output
│ ├── metrics/ # Evaluation results
│ │ └── gpt2-model-evaluation-metrics.txt
│ └── predictions/ # Inference results
│ └── predictions_output.txt
│
├── config.py # Google Drive & Colab folder setup
├── requirements.txt # Dependencies
└── README.md # Project documentation
-
Introduction – Fine-tuned GPT-2 to classify tweets into Negative, Neutral, and Positive sentiment categories.
-
Data Cleaning – Removed null values, duplicates, mentions, URLs, and extra whitespace for cleaner inputs.
-
Tokenization and Data Collation – Tokenized tweets using the GPT-2 tokenizer, with padding dynamically handled during batching.
-
Model Setup and Fine-Tuning – Loaded
GPT2ForSequenceClassificationwith 3 output labels. Trained over 5 epochs using Hugging Face’sTrainer. -
Training Configuration – Optimized training with batch size 8, learning rate 2e-5, and automatic model checkpointing and evaluation.
-
Evaluation Metrics – Used accuracy and weighted F1-score, with confusion matrix and classification report to analyze performance.
-
Inference Pipeline – Created a
TextClassificationPipelineto predict sentiment from real tweets, along with confidence scores. -
Conclusion – Delivered a robust sentiment analysis model ready for use in real-time applications like social media monitoring or customer feedback analysis.
-
Deployment – Deployed interactive web demo using Gradio available both locally and on Hugging Face Spaces.
This project uses the MTEB Tweet Sentiment Extraction dataset, hosted on Hugging Face Datasets. It contains labeled tweets categorized into three sentiment classes: Negative (0), Neutral (1), and Positive (2).
- Source: MTEB Hugging Face
- Total samples: 31015 tweets
- Training set: 27481 tweets
- Test set: 3534 tweets
- Label distribution (test set): 1001 Negative, 1430 Neutral, and 1103 Positive tweets
GPT-2 was selected to explore its effectiveness in sequence classification tasks even though it is primarily a generative model. Fine-tuning GPT-2 for sentiment classification provided an opportunity to experiment beyond the typical encoder-only models like BERT and see how a decoder-based architecture handles this type of problem.
This project requires the following libraries:
pip install -r requirements.txt- Python
- PyTorch
- Transformers (Hugging Face)
- Scikit-learn
- FastAPI
- Gradio
- Requests
-
Clone this repository:
git clone https://github.com/herrerovir/gpt2-tweet-sentiment
-
Navigate to the project directory:
cd gpt2-tweet-sentiment -
Install the dependencies:
pip install -r requirements.txt
-
Run the notebook or script to train and test the model:
jupyter notebook
-
Open a new Google Colab notebook.
-
Clone the repository inside the notebook:
!git clone https://github.com/herrerovir/gpt2-tweet-sentiment
-
Navigate to the cloned folder and open the notebook
gpt2-finetune-tweet-sentiment.ipynb. -
Switch runtime to GPU Tesla T4 for faster training.
-
Follow the notebook to fine-tune GPT-2 and perform inference.
The trained model files are not included in this repository due to their large size. The fine-tuned GPT-2 model is saved to your Google Drive under models/gpt2/gpt2-final-model. It includes model weights and tokenizer files for easy loading.
Additionally, the fine-tuned model is publicly hosted and available for download at the Hugging Face Model Hub: 👉 See the model in Hugging Face Hub
After training for 5 epochs, the GPT-2 model achieved:
| Metric | Score |
|---|---|
| Accuracy | 79.37% |
| F1 Score | 79.34% |
| Eval Loss | 0.6867 |
| Class | Precision | Recall | F1-Score |
|---|---|---|---|
| Negative | 0.81 | 0.78 | 0.79 |
| Neutral | 0.76 | 0.76 | 0.76 |
| Positive | 0.82 | 0.85 | 0.84 |
These results show the model generalizes well and maintains balance across all sentiment classes.
The model accurately classifies tweet sentiments with confidence scores:
-
Input: "The food was hot and delicious." Prediction: Positive (Confidence: 99.93%)
-
Input: "Ugh, my flight got delayed again." Prediction: Negative (Confidence: 99.95%)
-
Input: "Heading to the grocery store, then back to work." Prediction: Neutral (Confidence: 99.57%)
-
Input: "Lost all my work because of a crash. Fantastic." Prediction: Positive (Confidence: 52.06%)
⚠️ (sarcasm not detected)
These highlight both the strengths and limitations of the model, especially when sarcasm is involved.
The GPT-2 model proves effective for sentiment classification on social media text. With nearly 80% accuracy and F1 score, and consistent per-class performance, it's a strong baseline for real-world applications. It performs especially well on clearly positive or negative tweets, but can be improved to better detect sarcasm or subtle tones.
You can interact with the tweet sentiment classifier via a web interface using either local deployment or a cloud-hosted app on Hugging Face Spaces.
- Install dependencies
From the root directory of the repository, run:
pip install -r requirements.txt- Start the FastAPI backend
Open a terminal, navigate to the app folder, and run the FastAPI app:
cd app
uvicorn main:app --reload- This launches the backend server at:
http://127.0.0.1:8000 - The FastAPI backend serves model inference endpoints.
- Start the Gradio frontend
Open a new terminal window, stay inside the app directory, and run:
python frontend.py- This launches the Gradio UI on:
http://localhost:7860 - The frontend calls the FastAPI backend for predictions.
- Use the web app:
Open your browser and go to:
http://localhost:7860
You’ll see the interactive web app where you can enter tweets and receive sentiment predictions instantly.
You can also test the model live in your browser via the Hugging Face Space:
👉 Try the Live Demo on Hugging Face Spaces
No installation or GPU required, just open the link and start analyzing tweet sentiments instantly.
Built with Hugging Face Transformers, PyTorch, and Scikit-learn. Trained using free GPU resources via Google Colab.