Advanced customer segmentation and analysis platform for e-commerce businesses
Overview • Dataset Overview • Features • Architecture • Installation • Analysis • Visualizations • Requirements
A comprehensive customer segmentation solution that analyzes e-commerce transaction data to derive actionable business insights. The project implements RFM (Recency, Frequency, Monetary) analysis using both statistical and machine learning approaches to segment customers and generate targeted marketing strategies.
The dataset is the output of the RetailSync-ETL-Pipeline project where the raw online transactions data was extracted from Amazon Redshift warehouse, transformed and loaded to S3 cloud object. It contains 399,841 transaction records from 01/12/2010 to 09/12/2011.
invoice: Invoice numberstock_code: Product codedescription: Product descriptionprice: Unit pricequantity: Quantity purchasedtotal_order_value: Total transaction valueinvoice_date: Date and time of purchasecustomer_id: Unique customer identifiercountry: Country of customer
- 📊 Dual Segmentation Approach:
- Percentile-based RFM analysis
- K-means clustering
- 🔄 Complete ETL Pipeline:
- Data extraction from Amazon Redshift
- Comprehensive data preprocessing
- Automated workflow
- 🎯 Marketing Strategy Generation:
- Segment-specific campaign recommendations
- Personalized customer engagement plans
- 📈 Advanced Analytics:
- Customer behavior analysis
- Purchase pattern identification
- Temporal trend analysis
- 📊 Interactive Visualizations:
- Customer segment distribution
- Purchase patterns
- RFM metric analysis
-
Data Pipeline
- Source: Amazon Redshift warehouse
- Processing: Python-based ETL
- Storage: CSV and Pickle files
-
Analysis Modules
- Data preprocessing and validation
- Exploratory data analysis
- RFM metric calculation
- Customer segmentation
- Marketing strategy generation
-
Output Deliverables
- Segmented customer profiles
- Marketing campaign strategies
- Interactive visualizations
- Actionable insights
- Python 3.7+
- Python IDE or Text Editor
- Jupyter Notebook
- Required Python packages
-
Clone the Repository
git clone https://github.com/VaibhavDaveDev/E-Commerce-Customer-Segmentation-Insights.git cd E-Commerce-Customer-Segmentation-Insights -
Environment Setup
cp .env.example .env # Configure environment variables in .env -
Install Dependencies
pip install -r requirements.txt
- Handle missing values and duplicates
- Remove non-product transactions
- Process returns and cancellations
- Create date-based features
-
RFM Analysis:
- Calculate recency, frequency, monetary metrics
- Score customers on 3-9 scale
- Generate segment labels
-
Machine Learning:
- Prepare data (scaling, transformation)
- Apply K-means clustering
- Validate results
- 38.79% Top-performing and active customers
- 27.03% Unsteady customers
- 34.18% At-risk and inactive customers
- Average customer spend: £1,829.95
- Peak business hours: 10 AM - 3 PM
- Customer segment distribution
- Purchase patterns by time
- Geographic analysis
- RFM score distribution
- Customer value analysis
- Tableau dashboard integration
- Dynamic filtering
- Drill-down capabilities
- Export options
- Python 3.7+
- 8GB RAM recommended
- Storage: 1GB minimum
- pandas
- numpy
- scikit-learn
- seaborn
- matplotlib
- yellowbrick
-
Using Python Scripts
python main.py
-
Using Jupyter Notebooks Run notebooks in sequence:
- 1_data_preprocessing.ipynb
- 2_exploratory_data_analysis.ipynb
- 3_customer_segmentation_percentile_ranking.ipynb
- 4_customer_segmentation_kmeans.ipynb
