|
733 | 733 | - **[Weights & Biases Weave](https://github.com/wandb/weave)**  - Open-source tracing and experiment tracking. |
734 | 734 | - **[Aim](https://github.com/aimhubio/aim)**  - Self-hosted ML experiment tracker designed to handle 10,000s of training runs with performant UI and SDK for programmatic access. Apache 2.0 licensed. |
735 | 735 | - **[Feast](https://github.com/feast-dev/feast)**  - Open source feature store for ML. Manages offline/online feature storage with point-in-time correctness to prevent data leakage. Apache 2.0 licensed. |
| 736 | +- **[whylogs](https://github.com/whylabs/whylogs)**  - Open-source data logging library for ML models and data pipelines. Provides visibility into data quality and model performance over time with privacy-preserving data collection. Apache 2.0 licensed. |
| 737 | +- **[OpenLineage](https://github.com/OpenLineage/OpenLineage)**  - Open standard for lineage metadata collection designed to instrument jobs as they run. Defines a generic model of run, job, and dataset entities for consistent data lineage tracking. Apache 2.0 licensed. |
736 | 738 |
|
737 | 739 | #### Model Hubs & Registries |
738 | 740 |
|
|
772 | 774 | - **[NVIDIA KAI Scheduler](https://github.com/NVIDIA/KAI-Scheduler)**  - Kubernetes-native GPU scheduler for AI workloads at large scale. Originally developed by Run:ai, now open-sourced by NVIDIA. Optimizes GPU resource allocation with dynamic allocation and efficient queue management. Apache 2.0 licensed. |
773 | 775 | - **[NVIDIA DeepOps](https://github.com/NVIDIA/deepops)**  - Infrastructure automation tools for building GPU clusters with Kubernetes and Slurm. Deploys multi-node GPU clusters with monitoring, logging, and storage for AI/HPC workloads. BSD-3-Clause licensed. |
774 | 776 | - **[SkyPilot](https://github.com/skypilot-org/skypilot)**  - Run, manage, and scale AI workloads on any AI infrastructure. Unified interface to access and manage compute across Kubernetes, Slurm, and 20+ cloud providers. Used by Shopify and research institutions for training and inference. Apache 2.0 licensed. |
| 777 | +- **[Volcano](https://github.com/volcano-sh/volcano)**  - Cloud-native batch scheduling system for compute-intensive workloads. CNCF incubating project with gang scheduling, job dependency management, and topology-aware scheduling for AI/ML and deep learning. Apache 2.0 licensed. |
| 778 | +- **[Apache YuniKorn](https://github.com/apache/yunikorn-core)**  - Kubernetes resource scheduler for batch, data, and ML workloads. Provides hierarchical resource queues, multi-tenancy fairness, and gang scheduling for big data and machine learning applications. Apache 2.0 licensed. |
775 | 779 |
|
776 | 780 | #### Feature Engineering & Data Preparation |
777 | 781 |
|
|
780 | 784 | - **[Feature-engine](https://github.com/feature-engine/feature_engine)**  - Python library with multiple transformers to engineer and select features for machine learning models. Scikit-learn compatible with fit() and transform() methods for encoding, imputation, variable transformation, and feature selection. BSD-3-Clause licensed. |
781 | 785 | - **[NVTabular](https://github.com/NVIDIA-Merlin/NVTabular)**  - GPU-accelerated feature engineering and preprocessing library for tabular data. Manipulates terabyte-scale datasets to train deep learning recommender systems. Component of NVIDIA Merlin framework. Apache 2.0 licensed. |
782 | 786 | - **[OpenMLDB](https://github.com/4paradigm/OpenMLDB)**  - Open-source machine learning database providing a feature platform for consistent features between training and inference. Real-time relational data feature computation system for online ML applications. Apache 2.0 licensed. |
| 787 | +- **[Featureform](https://github.com/featureform/featureform)**  - Virtual feature store that turns existing data infrastructure into a feature store. Define, manage, and serve model features, labels, and training sets with native embeddings support. MPL-2.0 licensed. |
783 | 788 |
|
784 | 789 | #### Monitoring, Evaluation & Observability |
785 | 790 |
|
|
0 commit comments