Portfolio Risk Modeling data pipeline

1. Architecture

A

B

C

2. Technologies Used

1. Programming Languages: - Python, -Pyspark

2. Scripting Language: SQL

3. Data Base: -PostgreSQL, -MongoDB, -Bigquery

4. DWH
    - POstgreSQL
    - Snowflake

5. Orchestrator: -Airflow
6. Data Viz: -Dashplotly

3. Overview

Pipeline A Extracts data from excel files, transform and load processed data in MongoDB collections. Then data are Extracted from the staging db, modeled into facts and dimensions. And Load in postgres data warehouse. Airflow is used for orchestration. The data dashboard is run in the cloud with dash-ploty application deployed in a Heroku VM.

With pipeline B, The data are Extracted from excel source files, transform and loaded as blob files in Google Cloud Storage bucket that serves as data lake. A Snowflake Storage Integration is created and serves as staging area in snowflake datawarehouse. The data are Extracted from staging area, modeled in facts and dimensions. Then loaded in snowflake data warehouse. all the orchestration work is done with Airflow.

Pipeline C Extracts data from excel source files, transform and loaded as blob file in Google Cloud Storage bucket that serves as data lake. The transformed data are then Extracted from GCS bucket, modeled and load in Google BigQuery data warehouse. Likewise, all the orchestration work is done by Airflow. And the dashboard is run by dash-plotly and is hosted in an Heroku virtual marchine.

Name		Name	Last commit message	Last commit date
Latest commit History 126 Commits
.github/workflows		.github/workflows
dashboards		dashboards
docs		docs
etl		etl
files-storage		files-storage
mtm		mtm
pipelines		pipelines
queries		queries
src		src
test		test
.gitignore		.gitignore
README.md		README.md
etl_gcs_snowflake.jpeg		etl_gcs_snowflake.jpeg
etl_google_bigquery.jpeg		etl_google_bigquery.jpeg
etl_mongodb_mssql.jpeg		etl_mongodb_mssql.jpeg
etl_mongodb_postgres.jpeg		etl_mongodb_postgres.jpeg
hedge.jpg		hedge.jpg
merchant.jpg		merchant.jpg
mtm-h.jpg		mtm-h.jpg
mtm.jpg		mtm.jpg
pipeline_dwh.jpg		pipeline_dwh.jpg
pipeline_msql.jpg		pipeline_msql.jpg
prod-q.jpg		prod-q.jpg
prod.jpg		prod.jpg
requirements.txt		requirements.txt
spark.ipynb		spark.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Portfolio Risk Modeling data pipeline

1. Architecture

A

B

C

2. Technologies Used

3. Overview

4. Data validation

5. ETL Pipeline

1. Mssql Pipeline

6. Staging

7. Data Warehouse

8. Dashboards

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 2

Uh oh!

Languages

HermannJoel/enr_portfolio_modeling

Folders and files

Latest commit

History

Repository files navigation

Portfolio Risk Modeling data pipeline

1. Architecture

A

B

C

2. Technologies Used

3. Overview

4. Data validation

5. ETL Pipeline

1. Mssql Pipeline

6. Staging

7. Data Warehouse

8. Dashboards

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages