Skip to content

Commit 385bfe7

Browse files
committed
Add Pachyderm and Hamilton to Data Processing & Manipulation section
Category 7: Data Processing for AI - Pachyderm (6,297 stars, Apache-2.0) - Data-centric pipelines with Git-like versioning - Hamilton (2,464 stars, Apache-2.0) - Declarative dataflow framework from Apache Both projects meet elite-tier criteria: - 1000+ GitHub stars - Active development (commits within 3 months) - OSI-approved Apache 2.0 license - Production-ready quality
1 parent c86a967 commit 385bfe7

1 file changed

Lines changed: 2 additions & 0 deletions

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -140,6 +140,8 @@
140140
- **[Temporal](https://github.com/temporalio/temporal)** ![GitHub stars](https://img.shields.io/github/stars/temporalio/temporal?style=social) - Durable execution platform for reliable workflow orchestration. Build resilient data pipelines and ML workflows that survive failures and continue execution exactly where they left off. MIT licensed.
141141
- **[Luigi](https://github.com/spotify/luigi)** ![GitHub stars](https://img.shields.io/github/stars/spotify/luigi?style=social) - Python module for building complex pipelines of batch jobs. Handles dependency resolution, workflow management, visualization, and Hadoop integration. Built at Spotify and battle-tested in production. Apache 2.0 licensed.
142142
- **[Mage.ai](https://github.com/mage-ai/mage-ai)** ![GitHub stars](https://img.shields.io/github/stars/mage-ai/mage-ai?style=social) - Modern open-source data pipeline tool for integrating and transforming data. AI-native ETL/ELT platform with 100+ integrations, real-time monitoring, and collaborative features. Apache 2.0 licensed.
143+
- **[Pachyderm](https://github.com/pachyderm/pachyderm)** ![GitHub stars](https://img.shields.io/github/stars/pachyderm/pachyderm?style=social) - Data-centric pipelines and data versioning for ML. Git-like data versioning with immutable lineage, automatic pipeline triggering on data changes, and Kubernetes-native scaling. Apache 2.0 licensed.
144+
- **[Hamilton](https://github.com/apache/hamilton)** ![GitHub stars](https://img.shields.io/github/stars/apache/hamilton?style=social) - Declarative dataflow framework for building testable, modular, self-documenting data pipelines. Encode lineage and metadata directly in Python functions. Originally from Stitch Fix, now Apache incubating. Apache 2.0 licensed.
143145

144146
#### Classical ML & Gradient Boosting
145147

0 commit comments

Comments
 (0)