v0.3.0

github-actions released this 14 Aug 21:06

84c2406

0.3.0 (2025-08-14)

Features

add --debug flag to eval-retry command (b26afaa)
add -M and -T flags for model and task arguments (#75) (46a6ba6)
add 'openbench' as alternative CLI entry point (#48) (68b3c5b)
add AI21 Labs inference provider (#86) (db7bde7)
add Baseten inference provider (#79) (696e2aa)
add Cerebras and SambaNova model providers (1c61f59)
add Cohere inference provider (#90) (8e6e838)
add Crusoe inference provider (#84) (3d0c794)
add DeepInfra inference provider (#85) (6fedf53)
add Friendli inference provider (#88) (7e2b258)
Add huggingface inference provider (#54) (f479703)
add Hyperbolic inference provider (#80) (4ebf723)
add initial GraphWalks benchmark implementation (#58) (1aefd07)
add Lambda AI inference provider (#81) (b78c346)
add MiniMax inference provider (#87) (09fd27b)
add Moonshot inference provider (#91) (e5743cb)
add Nebius model provider (#47) (ba2ec19)
add Nous Research model provider (#49) (32dd815)
add Novita AI inference provider (#82) (6f5874a)
add Parasail inference provider (#83) (973c7b3)
add Reka inference provider (#89) (1ab9c53)
add SciCode (#63) (3650bfa)
add support for alpha benchmarks in evaluation commands (#92) (e2ccfaa)
push eval data to huggingface repo (#65) (acc600f)

Bug Fixes

add missing newline at end of novita.py (ef0fa4b)
remove default sampling parameters from CLI (#72) (978638a)

Documentation

docs for 0.3.0 (#93) (fe358bb)
fix directory structure documentation in CONTRIBUTING.md (#78) (41f8ed9)

Chores

fix GraphWalks: Split into three separate benchmarks (#76) (d1ed96e)
update version (8b7bbe7)

Refactor

move task loading from registry to config and update imports (de6eea2)

CI

Enhance Claude code review workflow with updated prompts and model specification (#71) (b605ed2)

Assets 2