Skip to content

Files

Latest commit

 Cannot retrieve latest commit at this time.

History

History
28 lines (17 loc) · 1.18 KB

README.md

File metadata and controls

28 lines (17 loc) · 1.18 KB

Databricks and Data Factory: Creating and Orchestrating Pipelines in the Cloud

In this project, an entire data engineering pipeline was developed following the workflow below:

image

I started by creating and structuring a Data Lake in Azure. This Data Lake was organized into three layers:

  • Inbound Layer;
  • Bronze Layer;
  • Silver Layer.

The Inbound Layer is the entry point, where I added the raw real estate database. With this data in the entry layer, I used Databricks to apply specific transformations to the data and pass it through the Bronze and Silver layers of the Data Lake.

Once the data flow was structured, I used Azure Data Factory to orchestrate and automate the execution of this pipeline based on a specific time interval.

This project was developed for a course I taught at Alura. You can access it by clicking on the link: Course's link

Technologies used

  • Azure Data Lake Storage Gen 2;
  • Azure Databricks;
  • Azure Data Factory;
  • Scala.

Contact

Email: millenagena@gmail.com