Skip to content

swar00pduthks/DataEngineering

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation


Project: Data Modelling with Postgres

Table of Contents
  1. About The Project
  2. Getting Started

About The Project

  1. Discuss the purpose of this database in the context of the startup, Sparkify, and their analytical goals.
  2. State and justify your database schema design and ETL pipeline.
  3. [Optional] Provide example queries and results for song play analysis.

A music streaming app startup "Sparkify" is collecting the data about user activity on their app. the user activities are collected as JSON logs on their stream server.

To understand the users demand and behaviour to improve the app and make it more user relevant, the analytics team wants to build a database to load these JSON user activity logs and able to write queries on adhoc basis to analyze the captured data.

Database modelling design

we decided to build the database using PostgreSQL. As it is open source Database aswell and the requirement to satisfy the adhoc analytical queries and aggregation requirement can be best supported by a RDBM database

  • conceptual modelling As part of the data modelling excersie we identified following entities to be required
  1. Songs
  2. users
  3. Artists
  4. time

Then we decided to build the data model using star schema as this is best model faster analytical capabilities for adhoc queries and also best suited for analysis.

  • logical modelling Star schema

  • Physical modelling

sql queries

we will use Python to build our ETL pipeline as this is new language which provides lot of functionalities to read data and also integrate in future into any orchestration tool like airflow

Built With

Getting Started

  1. go to terminal run the below command to create database and required tables

create_tables

ETL

About

The DataEngineering repository is to store all data engineering related work

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published