Hi there! I'm Phung Thi Bac Ha, a passionate and certified Databricks Data Engineer Professional with a strong focus on building robust and scalable data solutions. I thrive on transforming raw data into actionable insights, leveraging cloud technologies and modern data engineering practices.
I'm proficient in various cloud platforms, data warehousing solutions, and data transformation tools, including:
- Cloud Platforms: Azure, AWS
- ETL/ELT: dbt, Databricks
- Programming Languages: SQL, Python, PySpark
- Infrastructure as Code: Terraform, Databricks Asset Bundles
- DevOps: Github Actions, Azure DevOps
Here's a glimpse into some of the projects I've been working on:
Azure + Databricks:
- Accelerated Healthcare Revenue Management Insights using Azure and Databricks: This project develops a robust, cloud-based data engineering pipeline to provide actionable insights for healthcare revenue cycle management (RCM). Leveraging Azure Data Factory, Databricks, and Delta Lake, the pipeline ingests, transforms, and quality-checks data from diverse sources (EMR, claims, APIs) within a Medallion architecture. By implementing SCD2 and generating key fact and dimension tables, the project empowers healthcare providers with data-driven decision-making to optimize revenue cycle efficiency, reduce costs, and improve financial performance. Secure credential management is ensured through Azure Key Vault.
- Unlocking Customer Insights: On-Premises to Azure Data Pipeline with Databricks: This project demonstrates an end-to-end data pipeline solution using Azure Data Factory, Azure Databricks, Azure Data Lake Storage Gen2, and Azure Key Vault. It extracts data from an on-premises MySQL database, performs transformations (including SCD Type 2 implementation), and creates an optimized "One Big Table" for analytical reporting. The pipeline is automated for daily execution.
- End-to-End Data Engineering with Azure, Databricks and dbt: A Deep Dive: This project demonstrates a complete data engineering pipeline leveraging Azure Databricks, Data Build Tool (DBT), and Azure cloud services. The project adheres to a Medalion Architecture, a data management paradigm that promotes a structured and iterative approach to data processing, ensuring data quality and reliability throughout the lifecycle.
dbt + MySQL
- Atliq Data Engineering with dbt and MySQL: Building an Efficient Data Pipeline: This project showcases expertise in building a robust data warehouse for Atliq Hardware using dbt and MySQL. It covers the full data engineering lifecycle, from data extraction and transformation to loading and quality assurance. Key achievements include designing a dimensional data model, implementing complex data transformations using dbt macros, ensuring data quality through comprehensive testing, and creating a "One Big Table" for self-service business intelligence. Proficient in dbt, MySQL, SQL, YAML, and Git, this project demonstrates strong data warehousing skills and best practices.
AWS + Databricks + Kafka/ Kinesis:
- [Project 1 Name](Project 1 Link): Brief description of Project 1
- [Project 2 Name](Project 2 Link): Brief description of Project 2
On-Premise (SQL):
- Atliq Hardware Data Analytics Project using MySQL: Data warehouse built using MySQL with a star & snowflake schema architecture for efficient analysis. Developing insightful reports, and automating tasks (User-defined functions & Store procedures for generating various reports) to empower business users with self-service capabilities.
Python:
- Retail Strategy Optimization: A Data-Driven Journey for Chip Retail Management: Using Python to understand chip purchasing behavior, identify key customer segments, and provide actionable recommendations for boosting chip sales while keeping snackers happy.
- Data Analysis in Hospitality Domain: Atliq Grands Case Study: Atliq Grands, a prominent Indian hotel chain, faced declining revenue and market share. This project leverages data analytics to improve their decision-making and regain a competitive edge.
- Databricks (Certified Data Engineer Professional)
- Azure Data Services (Data Factory, Data Lake Storage, Event Hubs)
- AWS Services (S3, Kinesis)
- dbt (Data Build Tool)
- SQL (MySQL, Postgresql)
- Python
- PySpark
- Kafka
- Power BI
I'm always eager to learn and explore new technologies in the data engineering field. Feel free to connect and discuss potential collaborations or opportunities!