Releases
v0.3.0
Compare
Sorry, something went wrong.
No results found
0.3.0 (2025-08-14)
Features
add --debug flag to eval-retry command (b26afaa )
add -M and -T flags for model and task arguments (#75 ) (46a6ba6 )
add 'openbench' as alternative CLI entry point (#48 ) (68b3c5b )
add AI21 Labs inference provider (#86 ) (db7bde7 )
add Baseten inference provider (#79 ) (696e2aa )
add Cerebras and SambaNova model providers (1c61f59 )
add Cohere inference provider (#90 ) (8e6e838 )
add Crusoe inference provider (#84 ) (3d0c794 )
add DeepInfra inference provider (#85 ) (6fedf53 )
add Friendli inference provider (#88 ) (7e2b258 )
Add huggingface inference provider (#54 ) (f479703 )
add Hyperbolic inference provider (#80 ) (4ebf723 )
add initial GraphWalks benchmark implementation (#58 ) (1aefd07 )
add Lambda AI inference provider (#81 ) (b78c346 )
add MiniMax inference provider (#87 ) (09fd27b )
add Moonshot inference provider (#91 ) (e5743cb )
add Nebius model provider (#47 ) (ba2ec19 )
add Nous Research model provider (#49 ) (32dd815 )
add Novita AI inference provider (#82 ) (6f5874a )
add Parasail inference provider (#83 ) (973c7b3 )
add Reka inference provider (#89 ) (1ab9c53 )
add SciCode (#63 ) (3650bfa )
add support for alpha benchmarks in evaluation commands (#92 ) (e2ccfaa )
push eval data to huggingface repo (#65 ) (acc600f )
Bug Fixes
add missing newline at end of novita.py (ef0fa4b )
remove default sampling parameters from CLI (#72 ) (978638a )
Documentation
docs for 0.3.0 (#93 ) (fe358bb )
fix directory structure documentation in CONTRIBUTING.md (#78 ) (41f8ed9 )
Chores
fix GraphWalks: Split into three separate benchmarks (#76 ) (d1ed96e )
update version (8b7bbe7 )
Refactor
move task loading from registry to config and update imports (de6eea2 )
CI
Enhance Claude code review workflow with updated prompts and model specification (#71 ) (b605ed2 )
You can’t perform that action at this time.