v0.2.0

github-actions released this 11 Aug 20:14

1bf97a2

0.2.0 (2025-08-11)

Features

add DROP (simple-evals) (#20) (f85bf19)
add Humanity's Last Exam (HLE) benchmark (#23) (6f10fb7)
add MATH and MATH-500 benchmarks for mathematical problem solving (#22) (9c6843b)
add MGSM (#18) (bec1a7c)
add openai MRCR benchmark for long context recall (#24) (1b09ebd)
HealthBench (#16) (2caa47d)

Documentation

update CLAUDE.md with pre-commit and dependency pinning requirements (f33730e)

Chores

GitHub Terraform: Create/Update .github/workflows/stale.yaml [skip ci] (1a00342)

Assets 2