Skip to content

Commit e365a4f

Browse files
authored
[Research] Add Cleanlab to Evaluation, Benchmarks & Datasets (#123)
- Cleanlab: Data-centric AI package for finding and fixing dataset issues - 11,410 stars, Apache-2.0 licensed, actively maintained - Detects label errors, outliers, and ambiguous examples Co-authored-by: alvinreal <alvinreal@users.noreply.github.com>
1 parent 89b9169 commit e365a4f

1 file changed

Lines changed: 1 addition & 0 deletions

File tree

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -502,6 +502,7 @@
502502
#### High-quality Open Datasets & Data Tools
503503

504504
- **[Hugging Face Datasets](https://github.com/huggingface/datasets)** ![GitHub stars](https://img.shields.io/github/stars/huggingface/datasets?style=social) - Largest open repository of datasets.
505+
- **[Cleanlab](https://github.com/cleanlab/cleanlab)** ![GitHub stars](https://img.shields.io/github/stars/cleanlab/cleanlab?style=social) - Data-centric AI package for automatically finding and fixing issues in datasets. Detects label errors, outliers, and ambiguous examples in ML datasets. Apache 2.0 licensed.
505506
- **[FineWeb / FineWeb-2 (Hugging Face)](https://huggingface.co/datasets/HuggingFaceFW/fineweb)** - Curated 15T+ token web dataset for pre-training.
506507
- **[OSWorld](https://github.com/xlang-ai/OSWorld)** ![GitHub stars](https://img.shields.io/github/stars/xlang-ai/OSWorld?style=social) - Multimodal agent benchmark dataset.
507508

0 commit comments

Comments
 (0)