Skip to content

WebCraftPhil/superstore-sales-analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

36 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Superstore Sales Analysis

A portfolio-style data analysis project showcasing retail sales analysis using Python and pandas. This project demonstrates fundamental data analysis skills for entry-level data analyst roles.

πŸ“Š Project Overview

This project analyzes retail sales data from a fictional superstore to uncover insights about sales performance, profitability, and customer behavior. The analysis focuses on answering practical business questions using clear, step-by-step Python code that's easy to understand and share with non-technical stakeholders.

🎯 Business Questions Answered

  1. What are the sales trends over time?

    • Year-over-year growth analysis
    • Monthly seasonal patterns
    • Peak sales periods identification
  2. Which products and categories perform best?

    • Top-performing categories and sub-categories
    • Sales distribution across product types
    • Product performance comparison
  3. How does regional performance vary?

    • Sales by geographic region
    • Top-performing states
    • Regional market share analysis
  4. What customer segments generate the most revenue?

    • Customer segment breakdown (Consumer, Corporate, Home Office)
    • Average order value by segment
    • Purchasing behavior patterns
  5. Which products are most profitable?

    • Profit analysis by category and sub-category
    • Profit margin calculations
    • Identification of high and low-margin products
  6. How does shipping performance vary by mode?

    • Shipping delay analysis across different shipping modes
    • Distribution and trends of delivery times
    • Identification of shipping efficiency opportunities

πŸ“ Project Structure

superstore-sales-analysis/
β”‚
β”œβ”€β”€ data/
β”‚   └── superstore_sales.csv          # Sample retail sales dataset
β”‚
β”œβ”€β”€ notebooks/
β”‚   β”œβ”€β”€ superstore_analysis.ipynb          # Main analysis notebook
β”‚   β”œβ”€β”€ 01_exploratory_analysis.ipynb      # Initial exploratory data analysis
β”‚   └── shipping_performance_analysis.ipynb # Shipping delay analysis
β”‚
β”œβ”€β”€ analysis.py                       # Python script for quick analysis
β”œβ”€β”€ requirements.txt                   # Python dependencies
β”œβ”€β”€ README.md                         # Project documentation
└── .gitignore                        # Git ignore file

πŸš€ Getting Started

Prerequisites

  • Python 3.8 or higher
  • Jupyter Notebook or JupyterLab

Installation

  1. Clone this repository:
git clone https://github.com/WebCraftPhil/superstore-sales-analysis.git
cd superstore-sales-analysis
  1. Install required packages:
pip install -r requirements.txt

Usage Options

Option 1: Run the Python Script (Quick Analysis)

For a quick text-based analysis with key insights:

python analysis.py

This will output all analysis results directly to the console.

Option 2: Use Jupyter Notebook (Full Interactive Analysis)

For the complete interactive analysis with visualizations:

  1. Launch Jupyter Notebook:
jupyter notebook
  1. Open notebooks/superstore_analysis.ipynb and run the cells

πŸ“¦ Dependencies

  • pandas: Data manipulation and analysis
  • matplotlib: Basic plotting and visualizations
  • seaborn: Statistical data visualization
  • jupyter: Interactive notebook environment

πŸ“ˆ Key Features

  • Clear, readable code: Written with clarity over cleverness
  • Step-by-step approach: Each analysis section builds on the previous one
  • Well-labeled visualizations: Charts designed for easy interpretation
  • Business-focused: Answers practical business questions
  • Beginner-friendly: Extensive comments and explanations
  • No machine learning: Focus on fundamental analysis techniques
  • Portfolio-ready: Suitable for showcasing to potential employers

πŸ“Š Sample Visualizations

The analysis includes various visualizations:

  • Line charts for time series trends
  • Bar charts for category comparisons
  • Box plots for distribution analysis
  • Pie charts for distribution analysis
  • Scatter plots for relationship analysis
  • Horizontal bar charts for rankings

Shipping Performance Analysis

The shipping_performance_analysis.ipynb notebook includes:

  • Distribution Box Plots: Shows shipping delay distributions by mode with quartiles and outliers
  • Average Delay Bar Charts: Clear comparison of mean delays across shipping modes
  • Time Series Line Charts: Trends of shipping performance over time by mode

πŸ” Analysis Methodology

  1. Data Loading: Import dataset using pandas
  2. Data Exploration: Understand structure and content
  3. Data Cleaning: Convert dates, handle missing values
  4. Analysis: Answer each business question systematically
  5. Visualization: Create clear, labeled charts
  6. Insights: Summarize findings and recommendations

πŸ“ Key Insights

  • Sales show consistent growth patterns over time with identifiable seasonal trends
  • Technology and Office Supplies are the largest revenue categories
  • Regional performance varies significantly, with certain states driving most sales
  • Consumer segment represents the majority of customers
  • Profitability varies widely across product categories, with some requiring pricing adjustments
  • Shipping delays vary by mode: First Class (~3 days), Second Class (~4 days), Standard Class (~5 days)

πŸŽ“ Skills Demonstrated

  • Data manipulation with pandas
  • Exploratory data analysis (EDA)
  • Data visualization with matplotlib and seaborn
  • Business metrics calculation (profit margins, growth rates)
  • Statistical summarization
  • Clear documentation and communication
  • Jupyter Notebook proficiency

πŸ’‘ Future Enhancements

Potential areas for extended analysis:

  • Customer retention and lifetime value analysis
  • Discount effectiveness analysis
  • Product bundling opportunities
  • Time series forecasting
  • Customer segmentation clustering

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

πŸ‘€ Author

Phillip Greene

πŸ™ Acknowledgments

  • Dataset inspired by the classic Superstore dataset used in data analysis education
  • Created as a portfolio project for data analyst job applications
  • Designed with feedback from hiring managers and data professionals

Note: This is a portfolio project created for demonstration purposes. The data is fictional and meant to showcase data analysis skills.

Overview

This project analyzes retail sales data from a fictional superstore to uncover revenue drivers, seasonal trends, and underperforming regions. The goal is to demonstrate practical data analysis skills using Python, Excel, and Tableau to answer real business questions.

Dataset

  • Source: Public Superstore Sales dataset
  • Records: ~10,000+ orders
  • Key fields: Order Date, Product Category, Sub-Category, Sales, Profit, Region

Tools Used

  • Python (pandas, matplotlib)
  • Excel (pivot tables, lookup functions)
  • Tableau Public (interactive dashboards)

Business Questions

  • Which product categories and sub-categories generate the most revenue?
  • How does seasonality affect sales performance?
  • Which regions are underperforming in terms of profit?
  • Are there high-revenue but low-profit product segments?

Key Insights

  • A small number of product categories drive a majority of total revenue.
  • Sales exhibit clear seasonal spikes during specific months.
  • Certain regions consistently underperform despite strong order volume.
  • Some high-volume products have low or negative profit margins.

Recommendations

  • Focus marketing and inventory investment on high-margin categories.
  • Review pricing or cost structure for consistently unprofitable products.
  • Target underperforming regions with localized promotions or logistics improvements.

Deliverables

  • Python notebook for data cleaning and analysis
  • Excel workbook with pivot-table analysis
  • Tableau dashboard for executive-level insights

Tableau Dashboard

[Link to Tableau Public dashboard will go here]

Next Steps

  • Deeper customer segmentation analysis
  • Profit optimization modeling
  • Forecasting future sales trends

Conclusion

superstore-sales-analysis

Retail sales data analysis using Python, Excel, and Tableau to uncover revenue drivers, seasonal trends, and business insights.

About

Retail sales data analysis using Python, Excel, and Tableau to uncover revenue drivers, seasonal trends, and business insights.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors