Skip to content

AndrewBavuels/Sales-and-Business-Report-with-Microsoft-Power-BI

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

100 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Sales and Business Report with Microsoft Power BI πŸ“ŠπŸ’ΌπŸ“ˆ

Let's Get the Party Started

This repository contains the ETL (Extract, Transform, Load) process for analyzing data from the "Dunder Mifflin Paper Company" sales.xlsx file using Power BI. The Excel file has been pre-processed to include a "COGS" (Cost of Goods Sold) column and updated branch names from the series "The Office".

Before continuing into this repository, I was already working in Power BI until reaching the point where it can be seen in the following image:

img_0

  • The first thing I wanted to know was if sales were correlated with the company's profits. In the scatterplot, we can see that this is not the case.

  • We could think that the more we sell, the more profits we will have; However, we notice that there are losses in this graph.

As data professionals, we should not assume anything and take our conclusions for granted. Next, I will show you the components of my project:

1. Project description πŸ‘‡

Business Analytics: EDA with Python, ETL & Power BI.

The goal is to demonstrate how Python and Power BI merge when exploring the components of Business Intelligence (BI) flow. This includes:

  • Performing Exploratory Data Analysis (EDA) using Python to clean data and remove outliers in the Fact table.

  • Following the EDA, the project employs ETL (Extract, Transform, Load) processes to prepare the data for advanced analysis and interactive visualizations in Power BI.

  • Practical uses of DAX (Data Analysis Expressions) in Microsoft Power to create business metrics and KPIs, enabling data-driven decision-making.

Functional architecture design:

DM_pipeline

I. Exploratory Data Analysis (EDA)

Here is a little extract from Jupyter Notebook, using Python and their libraries (described down in the Technology Stack section).

EDA_with_Python

Remember when I first detected the outliers in the Power BI visualization? Well, this is the part and I am performing their handling per each customer segment

II. Power Query for ETL

Data extraction, source consolidation, cleaning, and transformation.

Extract

  1. Open Power BI Desktop.
  2. Select "Get Data" and choose "Excel".
  3. Locate and select "sales.xlsx".
  4. Choose the table format sheets (for this case, "FACT_Sales", "DIM_SKU" and "DIM State_Branches").

img_1

Coming up next, we will see the dimension table "State_Branches" has no relation to the fact one "Sales"

img_2

So, we need to do some transforming work in Power Query:

Transform

  1. Rename columns for State_Branches by using first rows as headers:

img_3

  1. Check the Model View if the relation to Sales was created:

img_4

  1. Check in the Table View the fact table "Sales" and determine the measures we will create from:

img_5.1

Load

  1. Apply changes and close Power Query Editor.
  2. Load transformed data into the Power BI data model.
  3. Establish relationships between tables if multiple tables are used.
  4. Create Measures for the columns previously marked in the red rectangle:
    • Quantity.
    • Sales.
    • COGS.
    • Profit.

img_5.2

The ETL execution will depend on the specific needs of your data analysis. Mostly, those key moments happen when:

  • Integrating new data sources.
  • Updating existing data.
  • Developing new analyses or reports.
  • Modifying the data structure.
  • Performing data maintenance or cleaning.

III: Power Pivot (DAX) for Data Modeling

Relationships, indicators, optimization.

DAX is a formula language that allows users to create custom calculations & expressions in Power BI.

It is similar to Excel formulas but is specifically designed for use in Power BI & other Microsoft BI tools. Some of them used were:

  1. Sales Measures:

    • Total Sales = SUM ( FACT_Sales[Sales] )

    • Sales Ranking = RANKX ( ALL ( 'DIM state_branches'[Branch] ), [Total Sales] )

    • Cumulative Total Sales = CALCULATE ( [Total Sales], TOPN ( [Sales Ranking], ALL ( 'DIM state_branches'[Branch] ), [Total Sales] ))

    • % Sales Performance = [Cumulative Total Sales] / CALCULATE ( [Total Sales], ALL ( 'DIM state_branches'[Branch] ) )

  2. Time Intelligence:

    • DIM_Calendar = CALENDARAUTO()

img_6

IV: Reporting

Data visualization, reports, dashboards, storytelling.

To determine which reports I wanted to visualize and how to develop their storytelling, I referred to the section in Eric Ries' book "The Lean Startup" that discusses the Toyota method of the 5 Whys, which helps to identify the root cause of a detected problem. In this case, I wanted to understand why "higher sales did not translate into higher profits".

Five_Whys

The 5 Whys for Storytelling Development

  1. Why are sales and profit not correlated? Because more sales do not necessarily mean more profit.

  2. Why do more sales not necessarily mean more profit? Because some sales result in losses.

  3. Why do some sales result in losses? Because their costs are higher than the revenue generated from those sales.

  4. Why are their costs higher than the revenue? Because in some branches, operational costs are higher.

  5. Why are their operational costs higher? Maybe it has to do with some products that are complex to sell.

General_Overview

Data-Driven Actions πŸ’‘

Based on the insights derived from the visualized reports and the storytelling process, the following actionable steps are suggested to be taken:

  1. Analyze Cost Structure:

    • Conduct a detailed analysis of the cost structure to understand where cost optimizations can be made.
    • Identify areas where expenses are disproportionately high, despite higher sales.
  2. Improve Product Mix:

    • Temporarily withdraw products with excessively high costs.
    • Focus on promoting and selling profitable products.
    • Implement this strategy by prioritizing the branches that generate 80% of the company's revenue.

By executing these data-driven actions, we can systematically address the root causes of the problem and drive the organization toward sustainable profitability.

L&P_byProduct

2. Technology stack πŸ’»

Programming language:

Python Libraries:

  • matplotlib.pyplot: For data visualization.
  • numpy: For mathematical operations and array manipulation.
  • pandas: For data manipulation and analysis.
  • seaborn: For statistical data visualization.

Data Visualization Tools:

Extensions and Other Tools:

  • DAX Formatter: For formatting Data Analysis Expressions (DAX) queries.

DAX_Formatter

3. Report Demo πŸ“ŠπŸ“ˆπŸ“‰

To show what it looks like in action, click here.

Note: The demo is also contained in the report folder.

4. Folder structure πŸ“

└── project
    β”œβ”€β”€ architecture
    β”‚   └── Dunder Mifflin data pipeline.png
    β”œβ”€β”€ data
    β”‚   β”œβ”€β”€ raw
    β”‚   β”‚   β”œβ”€β”€ DIM sku.xlsx
    β”‚   β”‚   β”œβ”€β”€ DIM state_branches.xlsx
    β”‚   β”‚   └── FACT sales.xlsx
    β”‚   β”œβ”€β”€ processed
    β”‚   β”‚   └── FACT sales.xlsx
    β”‚   └── ready
    β”‚       └── sales.xlsx
    β”œβ”€β”€ notebooks
    β”‚   └── main.ipynb
    β”œβ”€β”€ report
    β”‚   └── Dunder Mifflin Sales Report.pbix
    └── README.md    

Contact infoπŸ“§

For further information, reach me at andres.buelvas.diago.01@gmail.com

About

ETL process for analyzing 'sales.xlsx' data from 'Dunder Mifflin Paper Company' with Power BI. Includes Python EDA, ETL, Power BI visualizations, and DAX for business metrics.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors