Skip to content

Commit c3918fa

Browse files
committed
Add 5 elite-tier MLOps projects: whylogs, OpenLineage, Featureform, Volcano, Apache YuniKorn
- whylogs: Data logging library for ML models (2816 stars, Apache 2.0) - OpenLineage: Open standard for lineage metadata collection (2426 stars, Apache 2.0) - Featureform: Virtual feature store (1973 stars, MPL-2.0) - Volcano: Cloud-native batch scheduling system (5514 stars, Apache 2.0) - Apache YuniKorn: Kubernetes resource scheduler for ML workloads (1007 stars, Apache 2.0)
1 parent 0edf6a3 commit c3918fa

1 file changed

Lines changed: 5 additions & 0 deletions

File tree

README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -733,6 +733,8 @@
733733
- **[Weights & Biases Weave](https://github.com/wandb/weave)** ![GitHub stars](https://img.shields.io/github/stars/wandb/weave?style=social) - Open-source tracing and experiment tracking.
734734
- **[Aim](https://github.com/aimhubio/aim)** ![GitHub stars](https://img.shields.io/github/stars/aimhubio/aim?style=social) - Self-hosted ML experiment tracker designed to handle 10,000s of training runs with performant UI and SDK for programmatic access. Apache 2.0 licensed.
735735
- **[Feast](https://github.com/feast-dev/feast)** ![GitHub stars](https://img.shields.io/github/stars/feast-dev/feast?style=social) - Open source feature store for ML. Manages offline/online feature storage with point-in-time correctness to prevent data leakage. Apache 2.0 licensed.
736+
- **[whylogs](https://github.com/whylabs/whylogs)** ![GitHub stars](https://img.shields.io/github/stars/whylabs/whylogs?style=social) - Open-source data logging library for ML models and data pipelines. Provides visibility into data quality and model performance over time with privacy-preserving data collection. Apache 2.0 licensed.
737+
- **[OpenLineage](https://github.com/OpenLineage/OpenLineage)** ![GitHub stars](https://img.shields.io/github/stars/OpenLineage/OpenLineage?style=social) - Open standard for lineage metadata collection designed to instrument jobs as they run. Defines a generic model of run, job, and dataset entities for consistent data lineage tracking. Apache 2.0 licensed.
736738

737739
#### Model Hubs & Registries
738740

@@ -772,6 +774,8 @@
772774
- **[NVIDIA KAI Scheduler](https://github.com/NVIDIA/KAI-Scheduler)** ![GitHub stars](https://img.shields.io/github/stars/NVIDIA/KAI-Scheduler?style=social) - Kubernetes-native GPU scheduler for AI workloads at large scale. Originally developed by Run:ai, now open-sourced by NVIDIA. Optimizes GPU resource allocation with dynamic allocation and efficient queue management. Apache 2.0 licensed.
773775
- **[NVIDIA DeepOps](https://github.com/NVIDIA/deepops)** ![GitHub stars](https://img.shields.io/github/stars/NVIDIA/deepops?style=social) - Infrastructure automation tools for building GPU clusters with Kubernetes and Slurm. Deploys multi-node GPU clusters with monitoring, logging, and storage for AI/HPC workloads. BSD-3-Clause licensed.
774776
- **[SkyPilot](https://github.com/skypilot-org/skypilot)** ![GitHub stars](https://img.shields.io/github/stars/skypilot-org/skypilot?style=social) - Run, manage, and scale AI workloads on any AI infrastructure. Unified interface to access and manage compute across Kubernetes, Slurm, and 20+ cloud providers. Used by Shopify and research institutions for training and inference. Apache 2.0 licensed.
777+
- **[Volcano](https://github.com/volcano-sh/volcano)** ![GitHub stars](https://img.shields.io/github/stars/volcano-sh/volcano?style=social) - Cloud-native batch scheduling system for compute-intensive workloads. CNCF incubating project with gang scheduling, job dependency management, and topology-aware scheduling for AI/ML and deep learning. Apache 2.0 licensed.
778+
- **[Apache YuniKorn](https://github.com/apache/yunikorn-core)** ![GitHub stars](https://img.shields.io/github/stars/apache/yunikorn-core?style=social) - Kubernetes resource scheduler for batch, data, and ML workloads. Provides hierarchical resource queues, multi-tenancy fairness, and gang scheduling for big data and machine learning applications. Apache 2.0 licensed.
775779

776780
#### Feature Engineering & Data Preparation
777781

@@ -780,6 +784,7 @@
780784
- **[Feature-engine](https://github.com/feature-engine/feature_engine)** ![GitHub stars](https://img.shields.io/github/stars/feature-engine/feature_engine?style=social) - Python library with multiple transformers to engineer and select features for machine learning models. Scikit-learn compatible with fit() and transform() methods for encoding, imputation, variable transformation, and feature selection. BSD-3-Clause licensed.
781785
- **[NVTabular](https://github.com/NVIDIA-Merlin/NVTabular)** ![GitHub stars](https://img.shields.io/github/stars/NVIDIA-Merlin/NVTabular?style=social) - GPU-accelerated feature engineering and preprocessing library for tabular data. Manipulates terabyte-scale datasets to train deep learning recommender systems. Component of NVIDIA Merlin framework. Apache 2.0 licensed.
782786
- **[OpenMLDB](https://github.com/4paradigm/OpenMLDB)** ![GitHub stars](https://img.shields.io/github/stars/4paradigm/OpenMLDB?style=social) - Open-source machine learning database providing a feature platform for consistent features between training and inference. Real-time relational data feature computation system for online ML applications. Apache 2.0 licensed.
787+
- **[Featureform](https://github.com/featureform/featureform)** ![GitHub stars](https://img.shields.io/github/stars/featureform/featureform?style=social) - Virtual feature store that turns existing data infrastructure into a feature store. Define, manage, and serve model features, labels, and training sets with native embeddings support. MPL-2.0 licensed.
783788

784789
#### Monitoring, Evaluation & Observability
785790

0 commit comments

Comments
 (0)