Popular repositories Loading
-
DataPipeline_airflow_pyspark
DataPipeline_airflow_pyspark PublicAn end-to-end ETL pipeline that uses Apache Airflow DAGs to orchestrate multi-stage data ingestion, transformation, and loading workflows. Large-scale datasets are processed using PySpark integrate…
Python
-
batch_medallion
batch_medallion PublicA scalable batch ETL pipeline using PySpark and Azure Data Lake Storage that implements the Bronze-Silver-Gold medallion architecture to progressively clean, validate, and aggregate raw data into a…
Python
-
kafka_streaming
kafka_streaming PublicA real time streaming pipeline using Apache Kafka producers/consumers and Spark Structured Streaming to process event data with subsecond latency. Built with consumer group partitioning, offset che…
Python
-
data
data PublicForked from saayam-for-all/data
ML based micro service that uses historical data stored on AWS S3 and real time data to come up with real time responses.
Jupyter Notebook
If the problem persists, check the GitHub status page or contact support.