GitHub - swar00pduthks/DataEngineering: The DataEngineering repository is to store all data engineering related work

Project: Data Modelling with Postgres

Table of Contents

About The Project
- Built With
Getting Started
- Prerequisites
- Installation

About The Project

Discuss the purpose of this database in the context of the startup, Sparkify, and their analytical goals.
State and justify your database schema design and ETL pipeline.
[Optional] Provide example queries and results for song play analysis.

A music streaming app startup "Sparkify" is collecting the data about user activity on their app. the user activities are collected as JSON logs on their stream server.

To understand the users demand and behaviour to improve the app and make it more user relevant, the analytics team wants to build a database to load these JSON user activity logs and able to write queries on adhoc basis to analyze the captured data.

Database modelling design

we decided to build the database using PostgreSQL. As it is open source Database aswell and the requirement to satisfy the adhoc analytical queries and aggregation requirement can be best supported by a RDBM database

conceptual modelling As part of the data modelling excersie we identified following entities to be required

Songs
users
Artists
time

Then we decided to build the data model using star schema as this is best model faster analytical capabilities for adhoc queries and also best suited for analysis.

logical modelling
Physical modelling

we will use Python to build our ETL pipeline as this is new language which provides lot of functionalities to read data and also integrate in future into any orchestration tool like airflow

Built With

Getting Started

go to terminal run the below command to create database and required tables

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
images		images
README.md		README.md
create_tables.py		create_tables.py
etl.ipynb		etl.ipynb
etl.py		etl.py
sql_queries.py		sql_queries.py
test.ipynb		test.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Project: Data Modelling with Postgres

About The Project

Database modelling design

Built With

Getting Started

About

Uh oh!

Releases

Packages

Languages

swar00pduthks/DataEngineering

Folders and files

Latest commit

History

Repository files navigation

Project: Data Modelling with Postgres

About The Project

Database modelling design

Built With

Getting Started

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages