An unsupervised machine learning project for customer segmentation using K-Means and DBSCAN clustering algorithms.
Segment customers into meaningful groups based on their purchasing behavior to:
- Identify high-value customers
- Discover hidden patterns in customer data
- Enable targeted marketing strategies
- Improve customer retention
Online Retail dataset containing transactional data:
| Feature | Description |
|---|---|
| InvoiceNo | Unique invoice number |
| StockCode | Product code |
| Description | Product description |
| Quantity | Quantity purchased |
| InvoiceDate | Date of transaction |
| UnitPrice | Price per unit |
| CustomerID | Unique customer identifier |
| Country | Customer's country |
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Data Loading │───►│ Preprocessing │───►│ Feature Eng. │
└─────────────────┘ └─────────────────┘ └─────────────────┘
│
▼
┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐
│ Analysis │◄───│ Clustering │◄───│ Feature Scaling │
└─────────────────┘ └─────────────────┘ └─────────────────┘
- Missing value handling
- Outlier detection and treatment (IQR method)
- Data type conversions
- Removing cancelled transactions
- Recency: Days since last purchase
- Frequency: Number of transactions
- Monetary: Total spending
- Optimal K selection using Elbow Method
- Silhouette Score analysis
- Centroid-based partitioning
- Density-based spatial clustering
- Automatic outlier detection
- No need to specify number of clusters
- Cluster distribution plots
- 2D/3D scatter plots
- Customer segment profiles
online-retail-segmentation/
├── README.md
├── requirements.txt
├── notebooks/
│ └── customer_segmentation.ipynb
├── data/
│ └── .gitkeep (add your data here)
└── outputs/
└── cluster_profiles.csv
pip install -r requirements.txtpandas
numpy
scikit-learn
matplotlib
seaborn
- Clone the repository
git clone https://github.com/YOUR_USERNAME/online-retail-segmentation.git
cd online-retail-segmentation-
Add your dataset to
data/folder -
Open and run the Jupyter notebook
jupyter notebook notebooks/customer_segmentation.ipynb| Segment | Recency | Frequency | Monetary | Description |
|---|---|---|---|---|
| Champions | Low | High | High | Best customers, buy often |
| Loyal | Low | Medium | Medium | Regular customers |
| At Risk | High | Medium | Medium | Haven't bought recently |
| Lost | High | Low | Low | Haven't bought in long time |
- Customer Distribution: Majority fall into 3-4 main segments
- High-Value Customers: ~20% of customers drive ~80% of revenue
- Seasonal Patterns: Purchasing behavior varies by season
- Geographic Trends: Different countries show different patterns
- Personalized Marketing: Target each segment differently
- Inventory Management: Stock based on segment preferences
- Customer Retention: Focus on at-risk segments
- Pricing Strategy: Offer deals to specific segments
This project demonstrates:
- Unsupervised learning techniques
- RFM (Recency, Frequency, Monetary) analysis
- Cluster validation methods
- Business insight generation from data
- IoT & AI Developer @ VoltX
- CS Student @ Helwan University
⭐ Star this repo if you find it useful!