Skip to content

LakshitaPagaria/E-commerce-Customer-Segmentation-using-RFM-Analysis

Repository files navigation

📈 E-commerce Customer Segmentation using RFM Analysis

1. 🎯 Project Overview & Business Problem

In today's competitive e-commerce landscape, understanding and catering to different customer groups is paramount for sustainable growth. A one-size-fits-all marketing approach is inefficient and often ineffective. This project addresses this challenge by implementing a data-driven customer segmentation strategy using the powerful RFM (Recency, Frequency, Monetary) model.

The primary goal is to move beyond generic marketing by identifying distinct customer personas based on their transaction history. By analyzing customer behavior, we can answer critical business questions:

  • Who are our most valuable and loyal customers (Champions)?
  • Which customers are at risk of churning and need re-engagement (At Risk)?
  • Who are our new customers, and how can we nurture them into loyal ones?
  • Which customers are Lost and likely not worth significant marketing investment?

This analysis provides actionable insights that enable the business to tailor marketing campaigns, personalize communication, improve customer retention, and ultimately, drive revenue growth.


2. 📊 Dashboard Preview

The final analysis is presented in an interactive Power BI dashboard, designed to provide key stakeholders with an at-a-glance understanding of the customer base.

E - Commerce Customer Segmentation Dashboard.png


3. 💻 Technical Stack

  • Data Analysis & Manipulation: Python, Pandas, NumPy
  • Development Environment: Jupyter Notebook
  • Data Visualization & BI: Microsoft Power BI

4. ⚙️ Methodology: The RFM Model

The project is centered around the RFM model, a proven marketing analysis technique used to quantitatively evaluate customer value.

  • Recency (R): How recently did a customer make a purchase? Customers who purchased recently are more likely to purchase again.
  • Frequency (F): How often do they purchase? Customers who purchase frequently are more engaged and loyal.
  • Monetary (M): How much do they spend? Customers who spend more are more valuable to the business.

The analysis follows these key steps:

  1. Data Loading & Cleaning:

    • The raw transactional data is loaded from the online_retail_II.csv file.
    • The InvoiceDate column is converted to a proper datetime format.
    • A TotalAmount feature is engineered by multiplying Quantity and Price.
  2. RFM Value Calculation:

    • For each unique Customer ID, the three RFM values are calculated:
      • Recency: The number of days between the customer's last purchase and the most recent date in the dataset.
      • Frequency: The total number of unique invoices associated with the customer.
      • Monetary: The total sum of TotalAmount for all of the customer's transactions.
  3. Weighted RFM Scoring & Segmentation:

    • Instead of simple quintiles, a weighted scoring system is used to generate a single, consolidated RFM_Score.
    • First, customers are ranked based on each individual RFM metric (R, F, and M).
    • These ranks are then normalized to a scale of 0-100 to ensure comparability.
    • A weighted formula is applied to calculate the final RFM_Score, giving the most importance to Monetary value: RFM_Score = (0.15 * R_rank_norm) + (0.30 * F_rank_norm) + (0.55 * M_rank_norm)
    • Based on this final score, customers are categorized into five distinct segments using np.where:
      • Top Customers (Score >= 9000)
      • High Value Customers (Score >= 8000)
      • Mid Value Customers (Score >= 7000)
      • Low Value Customers (Score >= 6000)
      • Lost Customers (Score < 6000)
  4. Data Export: The final processed DataFrame, containing each Customer ID along with their RFM values and assigned CustomerID_Segment, is exported to rfm_for_powerbi.csv for visualization.


5. 💡 Actionable Insights & Recommendations

The analysis and dashboard reveal several key insights that can inform business strategy:

  • Insight 1: The Pareto Principle in Action. A significant portion of revenue is generated by "Top Customers" and "High Value Customers," even though they may represent a smaller fraction of the total customer base.

    • Recommendation: Implement a loyalty program with exclusive perks for these segments to increase retention and lifetime value. Acknowledge their loyalty with personalized thank-you notes or early access to new products.
  • Insight 2: The "Lost" Customer Dilemma. The "Lost Customers" segment is the largest by customer count but contributes the least to revenue.

    • Recommendation: Avoid spending significant marketing budget on this group. Instead, a low-cost, automated "win-back" email campaign could be attempted. If there is no response, they can be excluded from future marketing pushes.
  • Insight 3: The Growth Opportunity. The "Mid Value Customers" represent a crucial group with potential to become high-value.

    • Recommendation: Target this segment with promotions and product recommendations based on their purchase history to increase their purchase frequency and monetary value.

6. 🛠️ How to Replicate the Project

To set up and run this project on your local machine, follow these steps:

  1. Clone the Repository:

    git clone [https://github.com/your-username/your-repository-name.git](https://github.com/your-username/your-repository-name.git)
    cd your-repository-name
  2. Set Up a Virtual Environment (Recommended):

    python -m venv venv
    source venv/bin/activate  # On Windows, use `venv\Scripts\activate`
  3. Install Dependencies:

    pip install pandas numpy matplotlib jupyterlab
  4. Download the Dataset:

    • Download the dataset from Kaggle: Online Retail II UCI Dataset
    • Unzip the file and place online_retail_II.csv in the root directory of the project.
  5. Run the Analysis Notebook:

    • Launch Jupyter Lab: jupyter lab
    • Open the RFM_Analysis.ipynb notebook and run all cells. This will generate the rfm_for_powerbi.csv file.
  6. View the Power BI Dashboard:

    • Open the .pbix file (Power BI file) using Power BI Desktop.
    • If prompted about a broken data source, go to Transform data -> Data source settings. Select the source and click "Change Source...". Browse to the rfm_for_powerbi.csv file you generated in the previous step.

About

An end-to-end data analysis project to identify and visualize high-value and at-risk e-commerce customers using the RFM model.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors