Commit 4bef341
committed
Add 4 elite-tier evaluation & benchmarking projects
Category 9: Evaluation, Benchmarks & Datasets
New additions:
- ARC-AGI (4,759 stars) - Abstraction and Reasoning Corpus for measuring general fluid intelligence
- AlpacaEval (1,976 stars) - Automatic evaluator for instruction-following LLMs from Stanford
- BigCode Evaluation Harness (1,040 stars) - Framework for evaluating code generation models
- E2B Code Interpreter (2,298 stars) - Secure sandbox infrastructure for evaluating code LLMs
All projects meet elite-tier criteria:
- 1000+ GitHub stars
- Active development (commits within last 3 months)
- OSI-approved licenses (Apache 2.0)1 parent bd225c0 commit 4bef341
1 file changed
Lines changed: 4 additions & 0 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
808 | 808 | | |
809 | 809 | | |
810 | 810 | | |
| 811 | + | |
| 812 | + | |
811 | 813 | | |
812 | 814 | | |
813 | 815 | | |
| |||
824 | 826 | | |
825 | 827 | | |
826 | 828 | | |
| 829 | + | |
| 830 | + | |
827 | 831 | | |
828 | 832 | | |
829 | 833 | | |
| |||
0 commit comments