Skip to content

ChahiriAbderrahmane/UK_OnlineRetail-DataEng

Repository files navigation

Banner

    👨‍🔧 Online Retail Data Pipeline 👷

Retail Data Pipeline with aws S3 bucket, Databricks, GCP BigQuery and Looker

Dashboard 📊 Request Feature

📝 Table of Contents

  1. Project Overview
  2. Key Insights
  3. Project Architecture
  4. Credits
  5. Contact

🔬 Project Overview

This an end-to-end data engineering project, where I created an ELT data pipeline to extract, analyze, and visualize insights from the data of an online retail company based in the UK.

💾 Dataset

This is a transnational data set that contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers.

The dataset includes the following columns:

Column Description
InvoiceNo Invoice number. Nominal, a 6-digit integral number uniquely assigned to each transaction. If this code starts with the letter 'c', it indicates a cancellation.
StockCode Product (item) code. Nominal, a 5-digit integral number uniquely assigned to each distinct product.
Description Product (item) name. Nominal.
Quantity The quantities of each product (item) per transaction. Numeric.
InvoiceDate Invoice Date and time. Numeric, the day and time when each transaction was generated.
UnitPrice Unit price. Numeric, Product price per unit in sterling.
CustomerID Customer number. Nominal, a 5-digit integral number uniquely assigned to each customer.
Country Country name. Nominal, the name of the country where each customer resides.

🎯 Project Goals

  • CSV ingestion from S3 into Databricks
  • Clean and transform data with Spark into parquet tables
  • Data Modeling: Implement a star schema for analytical queries
  • Load processed tables into BigQuery
  • Provide interactive dashboards in Looker Studio

🕵️ Key Insights

  • 💸 Total Revenue by Country

    • The UK 🇬🇧 is the country that generated the most of the company's revenue with over 1.8M followed by France with 182.4k.
  • 📈 Revenue by months

    • The month with the most revenue is July with more than 220K.
    • The month with the lowest revenue is December with 100K.

We can observe significant revenue increases in January (New Year), July (Wimbledon Finals), and November (Bonfire Night).

📝 Project Architecture

Architecture

⚙️ Data Modeling

image

Here are some screenshots of the work I've done.

🛠️ Technologies Used

Amazon S3 Databricks BigQuery Looker Studio

📋 Credits

📨 Contact Me

LinkedInWebsite • Gmail: chahiri.abderrahmane.eng@gmail.com

About

⚙️ Data pipeline using Amazon S3, Google BigQuery, Databricks, and Looker Studio.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors