🛍️ Product Category Classification with DistilBERT

This project implements a full-stack solution for product category classification covering data collection, preprocessing, modeling, evaluation, inference, and deployment. For local deployment, Streamlit is used for the frontend interface while FastAPI handles model inference. Since Hugging Face Spaces do not support running FastAPI alongside Streamlit, the deployment there combines the inference logic within Streamlit itself, supported by a backend script that loads the model and performs predictions. This setup allows for smooth cloud deployment with an interactive user interface, while keeping the local deployment organized and efficient.

The core of the project uses a pre-trained transformer model, DistilBERT, fine-tuned on a labeled dataset of product descriptions to assign each item to one of four categories. The entire workflow was developed and executed in Google Colab to take advantage of free GPU resources. A Tesla T4 GPU was used to speed up training and evaluation. The workflow covers everything from data preparation and model training to evaluation and real-world inference. The goal is to create a fast and reliable solution that can be used to automate product tagging in online retail systems.

👉 Try the live demo

🗃️ Repository Structure

Product-category-classifier/
│
├── data/                                      # Dataset
│   ├── raw/                                   # Raw data
│   └── processed/                             # Processed data
│
├── figures/                                   # Visualizations
│   ├── correlation-matrix.png        
│   └── product-category-distribution.png  
│
├── models/                                    # Trained models
│
├── notebooks/                                 # Jupyter Notebooks
│   └── product-category-classification.ipynb  # End-to-end project notebook
│
├── results/                                   # Model output
│   └── metrics                                # Model metrics
│       └── model-evaluation-metrics.txt
│   └── predictions                            # Model predictions
│       └── predictions_output.txt                       
│
├── colab_setup.py                             # Colab set up files
├── requirements.txt                           # Required dependencies
└── README.md                                  # Project documentation

📘 Project Overview

Introduction – Fine-tuned DistilBERT to classify e-commerce product descriptions into Electronics, Household, Books, and Clothing & Accessories.
Data Loading and Preparation – Cleaned dataset by removing missing entries, shuffled data, and mapped labels to numerical IDs.
Data Splitting and Tokenization – Performed stratified train-test split and tokenized product descriptions using DistilBERT tokenizer with padding and truncation.
Data Collation – Applied dynamic padding during batching for efficient training.
Model Setup and Fine-Tuning – Loaded pre-trained DistilBERT for sequence classification and fine-tuned it on the product description dataset.
Training Configuration – Set training parameters including batch size, epochs, weight decay, and evaluation strategy.
Evaluation Metrics – Used accuracy, precision, recall, and F1 score to monitor performance, achieving around 96.5% across metrics.
Inference Pipeline – Created a pipeline for real-time product category predictions with confidence scores. A Streamlit-based web interface is also provided for real-time inference, available both locally and via a live Hugging Face Space
Conclusion – Delivered an accurate, efficient model ready for automating product categorization in e-commerce applications.

📊 Dataset

This project uses the publicly available E-Commerce Text Classification Dataset, hosted on Zenodo. It contains product descriptions labeled into four high-level e-commerce categories:

Electronics
Books
Clothing & Accessories
Household

Datasest Info

Title: E-Commerce Text Dataset
Hosted on: Zenodo
Total Records: 50,424 product entries
Columns Used:
- description: natural language description of a product
- category: one of the four target classes

⚙️ Dependencies

This project requires the following Python libraries:

pip install -r requirements.txt

Python
PyTorch
Hugging Face Transformers
Hugging Face Datasets
Scikit-learn
Streamlit
FastAPI

▶️ How to Run the Project

Option 1: Run Locally with GPU

Clone this repository:

git clone https://github.com/herrerovir/Product-category-classifier

Navigate to the project directory:
```
cd Product-category-classifier
```
Install the required dependencies:
```
pip install -r requirements.txt
```
Open the Jupyter Notebook to run the project:
```
jupyter notebook
```
Follow the code to load the dataset, preprocess the data, fine-tune the DistilBERT model, and perform inference on new product descriptions.

Option 2: Run on Google Colab (Recommended if no GPU locally)

Open a new notebook in Google Colab.

Clone the repository inside the notebook:

!git clone https://github.com/herrerovir/Product-category-classifier

Navigate to the cloned folder and open the notebook Product-category-classification.ipynb.
Set runtime type to GPU and select Tesla T4.
Run the notebook cells or scripts to execute the project.

Colab’s Tesla T4 GPU accelerates training and evaluation without any local setup.

📂 Model Files

The trained model files are not included in this repository due to their large size. Since the project runs in Google Colab, the fine-tuned model is saved directly to your Google Drive during training. The colab_setup.py script in the root directory automatically creates all necessary folders to organize and store the model and related outputs once you run the project.

When you run the notebook in Colab, your trained model will be saved to the corresponding folder in your Drive, making it easy to load for inference or further training without needing to download from this repo.

Additionally, the fine-tuned model is publicly hosted and available for download at the Hugging Face Model Hub: 👉 See the model in Hugging Face Hub

📊 Model Performance

The model delivers consistently strong results across all key metrics, generalizes effectively on new data, and produces confident, reliable predictions.

Metric	Score
Accuracy	96.51%
Precision	96.52%
Recall	96.51%
F1 Score	96.51%
Eval Loss	0.2059

🚀 Inference Examples

The model confidently classifies product descriptions into their correct categories with high certainty. Here are a few examples showcasing its predictions:

Input: Samsung Galaxy Tab S9 Ultra with 14.6'' AMOLED Display and S Pen. Prediction: Electronics (Confidence: 99.88%)
Input: Atomic Habits by James Clear – Build Good Habits & Break Bad Ones. Prediction: Books (Confidence: 99.99%)
Input: Levi’s Men's 511 Slim Fit Jeans – Stretch Denim, Dark Indigo. Prediction: Clothing & Accessories (Confidence: 99.96%)

These results highlight the model’s ability to accurately understand diverse product descriptions and assign the right category with near-perfect confidence.

📈 Results

The fine-tuned DistilBERT model achieved strong performance with over 96% accuracy, precision, recall, and F1 score on the test set. It reliably categorizes product descriptions across Electronics, Household, Books, and Clothing & Accessories. During inference, the model outputs highly confident predictions, making it well-suited for practical e-commerce applications.

🌐 Deployment Options

You can interact with the product category classifier via a web interface using either local deployment or a cloud-hosted app on Hugging Face Spaces.

Option 1: Run Locally with Streamlit + FastAPI

To run the product category classifier locally with an interactive user interface and a modular backend, follow these steps:

This setup uses:

Streamlit for the frontend interface
FastAPI for handling model inference requests (via REST API)

Clone the repository:

git clone https://github.com/herrerovir/Product-category-classifier
cd Product-category-classifier

Install all required dependencies:
```
pip install -r requirements.txt
```
Start the FastAPI backend (in a separate terminal window):
```
uvicorn backend:app --reload
```
This will launch the API server at: http://127.0.0.1:8000
Launch the Streamlit frontend:
```
streamlit run app.py
```
Open the web app:

Once the Streamlit app is running, open your browser and go to:
```
http://localhost:8501
```

You’ll see a user-friendly web interface where you can enter product descriptions. This local deployment keeps the frontend and backend cleanly separated, making it easy to maintain, scale, or containerize for production use.

Option 2: Try It on Hugging Face Spaces (No Setup Required)

You can also test the model live in your browser via the Hugging Face Space:

👉 Try the Live Demo on Hugging Face Spaces

No installation or GPU required, just open the link and start classifying product descriptions instantly.

🙌 Acknowledgments

Built using Hugging Face Transformers, Datasets, and PyTorch.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🛍️ Product Category Classification with DistilBERT

🗃️ Repository Structure

📘 Project Overview

📊 Dataset

⚙️ Dependencies

▶️ How to Run the Project

Option 1: Run Locally with GPU

Option 2: Run on Google Colab (Recommended if no GPU locally)

📂 Model Files

📊 Model Performance

🚀 Inference Examples

📈 Results

🌐 Deployment Options

Option 1: Run Locally with Streamlit + FastAPI

Option 2: Try It on Hugging Face Spaces (No Setup Required)

🙌 Acknowledgments

About

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
data		data
figures		figures
notebooks		notebooks
results		results
README.md		README.md
app.py		app.py
colab_setup.py		colab_setup.py
frontend.py		frontend.py
requirements.txt		requirements.txt

herrerovir/Product-category-classifier

Folders and files

Latest commit

History

Repository files navigation

🛍️ Product Category Classification with DistilBERT

🗃️ Repository Structure

📘 Project Overview

📊 Dataset

⚙️ Dependencies

▶️ How to Run the Project

Option 1: Run Locally with GPU

Option 2: Run on Google Colab (Recommended if no GPU locally)

📂 Model Files

📊 Model Performance

🚀 Inference Examples

📈 Results

🌐 Deployment Options

Option 1: Run Locally with Streamlit + FastAPI

Option 2: Try It on Hugging Face Spaces (No Setup Required)

🙌 Acknowledgments

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages