y-preethi

y-preethi

Popular repositories Loading

DataPipeline_airflow_pyspark DataPipeline_airflow_pyspark Public

An end-to-end ETL pipeline that uses Apache Airflow DAGs to orchestrate multi-stage data ingestion, transformation, and loading workflows. Large-scale datasets are processed using PySpark integrate…

Python
batch_medallion batch_medallion Public

A scalable batch ETL pipeline using PySpark and Azure Data Lake Storage that implements the Bronze-Silver-Gold medallion architecture to progressively clean, validate, and aggregate raw data into a…

Python
kafka_streaming kafka_streaming Public

A real time streaming pipeline using Apache Kafka producers/consumers and Spark Structured Streaming to process event data with subsecond latency. Built with consumer group partitioning, offset che…

Python
data data Public

Forked from saayam-for-all/data

ML based micro service that uses historical data stored on AWS S3 and real time data to come up with real time responses.

Jupyter Notebook