Yelp Data Engineering & Analysis Project

Overview

This project processes and analyzes the Yelp Open Dataset, which is publicly available in JSON format. The dataset was first stored locally, then split into smaller files for efficient handling, uploaded to AWS S3, and finally loaded into Snowflake for structured querying and analysis.

Project Workflow

1. Data Ingestion & Storage

Downloaded the Yelp Open Dataset (JSON format) from the official website.
Split large JSON files into smaller chunks using a Python script.
Uploaded partitioned review and business data to an AWS S3 bucket.

2. Loading Data into Snowflake

Imported data from AWS S3 into Snowflake tables.
Stored JSON data in a column using the VARIANT data type to preserve the nested structure.
Extracted relevant fields from JSON and created structured tables for better querying.

3. Data Transformation & Analysis

Created two key tables in Snowflake:
- TBL_YELP_REVIEWS – containing review data.
- TBL_YELP_BUSINESS – containing business details.
Extracted relevant fields from JSON objects into structured columns.
Performed SQL-based analysis on business performance, customer sentiment, and location trends.

Technologies Used

Python (for data preprocessing & file splitting)
AWS S3 (for cloud storage)
Snowflake (for data warehousing & analysis)
SQL (for querying & analysis)

Key Insights & Results

Extracted meaningful business trends and customer insights from structured data.
Optimized large-scale JSON processing using a combination of Python, AWS, and Snowflake.
Demonstrated an efficient ETL (Extract, Transform, Load) pipeline for handling semi-structured data.

How to Reproduce

Download the Yelp Open Dataset from (https://business.yelp.com/data/resources/open-dataset).
Run the Python script to split the JSON files.
Upload the split files to an AWS S3 bucket.
Copy data from S3 into Snowflake using COPY INTO commands.
Extract relevant fields from JSON into structured tables.
Perform analysis using SQL queries.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
yelp_queries		yelp_queries
LICENSE		LICENSE
README.md		README.md
Yelp Split.ipynb		Yelp Split.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Yelp Data Engineering & Analysis Project

Overview

Project Workflow

1. Data Ingestion & Storage

2. Loading Data into Snowflake

3. Data Transformation & Analysis

Technologies Used

Key Insights & Results

How to Reproduce

About

Uh oh!

Releases

Packages

Languages

License

Mindmasterparav/yelp-etl-project

Folders and files

Latest commit

History

Repository files navigation

Yelp Data Engineering & Analysis Project

Overview

Project Workflow

1. Data Ingestion & Storage

2. Loading Data into Snowflake

3. Data Transformation & Analysis

Technologies Used

Key Insights & Results

How to Reproduce

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages