A portfolio-style data analysis project showcasing retail sales analysis using Python and pandas. This project demonstrates fundamental data analysis skills for entry-level data analyst roles.
This project analyzes retail sales data from a fictional superstore to uncover insights about sales performance, profitability, and customer behavior. The analysis focuses on answering practical business questions using clear, step-by-step Python code that's easy to understand and share with non-technical stakeholders.
-
What are the sales trends over time?
- Year-over-year growth analysis
- Monthly seasonal patterns
- Peak sales periods identification
-
Which products and categories perform best?
- Top-performing categories and sub-categories
- Sales distribution across product types
- Product performance comparison
-
How does regional performance vary?
- Sales by geographic region
- Top-performing states
- Regional market share analysis
-
What customer segments generate the most revenue?
- Customer segment breakdown (Consumer, Corporate, Home Office)
- Average order value by segment
- Purchasing behavior patterns
-
Which products are most profitable?
- Profit analysis by category and sub-category
- Profit margin calculations
- Identification of high and low-margin products
-
How does shipping performance vary by mode?
- Shipping delay analysis across different shipping modes
- Distribution and trends of delivery times
- Identification of shipping efficiency opportunities
superstore-sales-analysis/
β
βββ data/
β βββ superstore_sales.csv # Sample retail sales dataset
β
βββ notebooks/
β βββ superstore_analysis.ipynb # Main analysis notebook
β βββ 01_exploratory_analysis.ipynb # Initial exploratory data analysis
β βββ shipping_performance_analysis.ipynb # Shipping delay analysis
β
βββ analysis.py # Python script for quick analysis
βββ requirements.txt # Python dependencies
βββ README.md # Project documentation
βββ .gitignore # Git ignore file
- Python 3.8 or higher
- Jupyter Notebook or JupyterLab
- Clone this repository:
git clone https://github.com/WebCraftPhil/superstore-sales-analysis.git
cd superstore-sales-analysis- Install required packages:
pip install -r requirements.txtOption 1: Run the Python Script (Quick Analysis)
For a quick text-based analysis with key insights:
python analysis.pyThis will output all analysis results directly to the console.
Option 2: Use Jupyter Notebook (Full Interactive Analysis)
For the complete interactive analysis with visualizations:
- Launch Jupyter Notebook:
jupyter notebook- Open
notebooks/superstore_analysis.ipynband run the cells
- pandas: Data manipulation and analysis
- matplotlib: Basic plotting and visualizations
- seaborn: Statistical data visualization
- jupyter: Interactive notebook environment
- Clear, readable code: Written with clarity over cleverness
- Step-by-step approach: Each analysis section builds on the previous one
- Well-labeled visualizations: Charts designed for easy interpretation
- Business-focused: Answers practical business questions
- Beginner-friendly: Extensive comments and explanations
- No machine learning: Focus on fundamental analysis techniques
- Portfolio-ready: Suitable for showcasing to potential employers
The analysis includes various visualizations:
- Line charts for time series trends
- Bar charts for category comparisons
- Box plots for distribution analysis
- Pie charts for distribution analysis
- Scatter plots for relationship analysis
- Horizontal bar charts for rankings
The shipping_performance_analysis.ipynb notebook includes:
- Distribution Box Plots: Shows shipping delay distributions by mode with quartiles and outliers
- Average Delay Bar Charts: Clear comparison of mean delays across shipping modes
- Time Series Line Charts: Trends of shipping performance over time by mode
- Data Loading: Import dataset using pandas
- Data Exploration: Understand structure and content
- Data Cleaning: Convert dates, handle missing values
- Analysis: Answer each business question systematically
- Visualization: Create clear, labeled charts
- Insights: Summarize findings and recommendations
- Sales show consistent growth patterns over time with identifiable seasonal trends
- Technology and Office Supplies are the largest revenue categories
- Regional performance varies significantly, with certain states driving most sales
- Consumer segment represents the majority of customers
- Profitability varies widely across product categories, with some requiring pricing adjustments
- Shipping delays vary by mode: First Class (~3 days), Second Class (~4 days), Standard Class (~5 days)
- Data manipulation with pandas
- Exploratory data analysis (EDA)
- Data visualization with matplotlib and seaborn
- Business metrics calculation (profit margins, growth rates)
- Statistical summarization
- Clear documentation and communication
- Jupyter Notebook proficiency
Potential areas for extended analysis:
- Customer retention and lifetime value analysis
- Discount effectiveness analysis
- Product bundling opportunities
- Time series forecasting
- Customer segmentation clustering
This project is licensed under the MIT License - see the LICENSE file for details.
Phillip Greene
- GitHub: @WebCraftPhil
- X(Twitter): @vtguy65
- Dataset inspired by the classic Superstore dataset used in data analysis education
- Created as a portfolio project for data analyst job applications
- Designed with feedback from hiring managers and data professionals
Note: This is a portfolio project created for demonstration purposes. The data is fictional and meant to showcase data analysis skills.
This project analyzes retail sales data from a fictional superstore to uncover revenue drivers, seasonal trends, and underperforming regions. The goal is to demonstrate practical data analysis skills using Python, Excel, and Tableau to answer real business questions.
- Source: Public Superstore Sales dataset
- Records: ~10,000+ orders
- Key fields: Order Date, Product Category, Sub-Category, Sales, Profit, Region
- Python (pandas, matplotlib)
- Excel (pivot tables, lookup functions)
- Tableau Public (interactive dashboards)
- Which product categories and sub-categories generate the most revenue?
- How does seasonality affect sales performance?
- Which regions are underperforming in terms of profit?
- Are there high-revenue but low-profit product segments?
- A small number of product categories drive a majority of total revenue.
- Sales exhibit clear seasonal spikes during specific months.
- Certain regions consistently underperform despite strong order volume.
- Some high-volume products have low or negative profit margins.
- Focus marketing and inventory investment on high-margin categories.
- Review pricing or cost structure for consistently unprofitable products.
- Target underperforming regions with localized promotions or logistics improvements.
- Python notebook for data cleaning and analysis
- Excel workbook with pivot-table analysis
- Tableau dashboard for executive-level insights
[Link to Tableau Public dashboard will go here]
- Deeper customer segmentation analysis
- Profit optimization modeling
- Forecasting future sales trends
Retail sales data analysis using Python, Excel, and Tableau to uncover revenue drivers, seasonal trends, and business insights.