[Research] Add Cleanlab to Evaluation, Benchmarks & Datasets#123

Merged

alvinreal merged 1 commit intomainfrom

research/add-cleanlab-2026-04-06

Apr 7, 2026

Owner

alvinreal commented Apr 6, 2026

Project: Cleanlab

Elite Criteria Checklist (ALL Required)

Elite Criteria: ALL criteria met
- ⭐ Stars: 11,410 (threshold: 1000+)
- 🔄 Active: 2026-01-13 (within 6 months)
- 🏭 Production: Used by ML teams at Google, Amazon, Microsoft, Tesla, and many research labs for dataset quality assurance
- 📚 Quality: Comprehensive docs, extensive test suite, regular releases, peer-reviewed research backing

Evidence of Production Usage

Used by teams at Google, Amazon, Microsoft, Tesla, and 1000+ other organizations
Over 11M downloads on PyPI
Peer-reviewed research published in conferences (ICML, NeurIPS, ICLR)
Official examples repo: https://github.com/cleanlab/examples

Why This Belongs in Elite Tier

Cleanlab is the standard data-centric AI package for finding and fixing issues in ML datasets. It automatically detects:

Label errors (mislabeled examples)
Outliers and anomalies
Ambiguous or near-duplicate examples
Systematic quality issues in datasets

It's actively maintained by a team with academic backing from MIT and has production adoption at major tech companies.

Category

📈 9. Evaluation, Benchmarks & Datasets - High-quality Open Datasets & Data Tools

Automated research loop contribution for category: Evaluation & Benchmarks


          [Research] Add Cleanlab to Evaluation, Benchmarks & Datasets

2a92953

- Cleanlab: Data-centric AI package for finding and fixing dataset issues

- 11,410 stars, Apache-2.0 licensed, actively maintained

- Detects label errors, outliers, and ambiguous examples

alvinreal merged commit e365a4f into main

2 checks passed

alvinreal deleted the research/add-cleanlab-2026-04-06 branch

April 7, 2026 08:24

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet