LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
-
Updated
Mar 6, 2026 - Python
LLM-Driven Extraction of Unstructured Data — Built for API Deployments & ETL Pipeline Workflows
Fast and efficient unstructured data extraction. Written in Rust with bindings for many languages.
The simplest way to scale Python.
Data pipelines from re-usable components
The open-source Useful SDK. One python decorator in the Useful library allows for full observability of Python functions within an ETL.
Data Cleaning for Pyspark
A project structure for doing and sharing data engineer work.
Lien de l'application
DataSift auto applies a data pre-processing pipeline to Data Science Projects.
Build ETL piplines on AirFlow to load data from BigQuery and store it in MySQL
This project demonstrates a comprehensive data warehousing and analytics solution, from building a data warehouse to generating actionable insights. Designed as a portfolio project, it highlights industry best practices in data engineering and analytics.
Big Data ETL pipeline for Brazilian e-commerce data. Implements data ingestion, transformation, and storage using Apache Spark, Hadoop, and SQL. Designed for scalable data processing and analytics.
🗄️ IBM Relational Database Administrator with GenAI Certificate Portfolio – A comprehensive collection of projects, labs, and assignments showcasing expertise in relational database administration, 🏘️data warehousing, 🔁ETL pipelines, and 🤖Generative AI integration for modern database management.
🚀 A comprehensive showcase of projects and skills from the IBM Data Engineering Professional Certificate! 📚 Features include: 🔄 ETL pipelines, 🗄️ data warehousing, ⚡ big data processing with Spark/Hadoop, 🛠️ database administration, and 📈 business intelligence dashboards. Built with 🦾 to demonstrate real-world data engineering capabilities!
This repository contains my first end-to-end Data Engineering project, built using Microsoft Azure Cloud and Azure Databricks with PySpark.
Master the AWS Data Stack! 🚀 This repository features 15+ Industrial Data Engineering Projects covering Serverless ETL, Real-Time Streaming, & Data Warehousing. Hands-on labs for S3, Lambda, Spark, Airflow, Snowflake, Redshift, Kinesis, & Glue. Includes production-grade CICD pipelines. A complete roadmap to becoming a top Data Professional.
Complete portfolio of data engineering projects from Udacity's Data Engineering with AWS Nanodegree.
JSON-driven ETL pipeline framework prototype
A deployed machine learning model that has the capability to automatically classify the incoming disaster messages into related 36 categories. Project developed as a part of Udacity's Data Science Nanodegree program.
Add a description, image, and links to the etl-pipelines topic page so that developers can more easily learn about it.
To associate your repository with the etl-pipelines topic, visit your repo's landing page and select "manage topics."