This repository contains an end-to-end Business Intelligence (BI) implementation for food inspections in Chicago and Dallas. The project involves data extraction, profiling, cleaning, transformation, dimensional modeling, and visualization to analyze food inspection data across these cities.
- Downloaded datasets from the following open data portals:
- Profiling Tools: Alteryx, Python (YData profiling)
- Key insights from profiling:
- Chicago dataset: Contains 17 variables and 274,206 observations. Minimal missing values (1.8%), no duplicate rows.
- Dallas dataset: Contains 114 variables and 78,984 observations. High percentage of missing cells (72.2%) and 42 duplicate rows.
- Significant differences between schemas, especially in how violations are structured and the presence of critical columns such as
RiskandInspection Score.
- Stage Tables: Data from both cities was loaded into SQL Server/Azure SQL/MySQL with stage tables prefixed as
stg_. - Profiling & Cleaning:
- Applied transformations such as date conversions, trimming, and merging columns where required.
- Merged and cleaned violation data, particularly in the Dallas dataset, where violation information was spread across multiple columns.
- Ensured consistent data types across datasets (e.g., ensuring dates are stored as
datetimetypes).
- Dimensional Model: Designed using ER/Studio or Navicat.
- Fact and Dimension Tables:
- Fact Tables:
FACT_INSPECTION_INFOFACT_INSPECTION_VIOLATION
- Dimension Tables:
DIM_RESTAURANTDIM_VIOLATIONDIM_RISKDIM_RESULTDIM_GEODIM_SOURCE
- Added a calculated
Inspection Scoreto the fact tables.
- Fact Tables:
- Talend: Used to load data into the integration schema.
- Applied surrogate keys to all dimension tables and ensured proper transformation and data consistency.
- Performed detailed logging, including job runtimes and row counts for each table.
- Dashboards were created to analyze key metrics related to food inspections using Power BI and Tableau.
- Key Visualizations:
- Inspection results by
Inspection Type,Risk Category,Facility Type,Violations, andBusiness Name. - Distribution of inspection scores across different facilities and locations.
- Year-over-year trends in food inspection outcomes for Chicago and Dallas.
- Inspection results by
- Chicago Food Inspections Data: Food Inspections - Chicago Data Portal
- Dallas Food Inspections Data: Restaurant and Food Establishment Inspections - Dallas OpenData
This project provides a comprehensive BI solution to analyze food inspection data from Chicago and Dallas. By leveraging various tools and technologies, we successfully integrated disparate datasets, created a dimensional model, and provided actionable insights through interactive visualizations.