Skip to content

v0.3.0

Choose a tag to compare

@github-actions github-actions released this 14 Aug 21:06
84c2406

0.3.0 (2025-08-14)

Features

  • add --debug flag to eval-retry command (b26afaa)
  • add -M and -T flags for model and task arguments (#75) (46a6ba6)
  • add 'openbench' as alternative CLI entry point (#48) (68b3c5b)
  • add AI21 Labs inference provider (#86) (db7bde7)
  • add Baseten inference provider (#79) (696e2aa)
  • add Cerebras and SambaNova model providers (1c61f59)
  • add Cohere inference provider (#90) (8e6e838)
  • add Crusoe inference provider (#84) (3d0c794)
  • add DeepInfra inference provider (#85) (6fedf53)
  • add Friendli inference provider (#88) (7e2b258)
  • Add huggingface inference provider (#54) (f479703)
  • add Hyperbolic inference provider (#80) (4ebf723)
  • add initial GraphWalks benchmark implementation (#58) (1aefd07)
  • add Lambda AI inference provider (#81) (b78c346)
  • add MiniMax inference provider (#87) (09fd27b)
  • add Moonshot inference provider (#91) (e5743cb)
  • add Nebius model provider (#47) (ba2ec19)
  • add Nous Research model provider (#49) (32dd815)
  • add Novita AI inference provider (#82) (6f5874a)
  • add Parasail inference provider (#83) (973c7b3)
  • add Reka inference provider (#89) (1ab9c53)
  • add SciCode (#63) (3650bfa)
  • add support for alpha benchmarks in evaluation commands (#92) (e2ccfaa)
  • push eval data to huggingface repo (#65) (acc600f)

Bug Fixes

  • add missing newline at end of novita.py (ef0fa4b)
  • remove default sampling parameters from CLI (#72) (978638a)

Documentation

  • docs for 0.3.0 (#93) (fe358bb)
  • fix directory structure documentation in CONTRIBUTING.md (#78) (41f8ed9)

Chores

  • fix GraphWalks: Split into three separate benchmarks (#76) (d1ed96e)
  • update version (8b7bbe7)

Refactor

  • move task loading from registry to config and update imports (de6eea2)

CI

  • Enhance Claude code review workflow with updated prompts and model specification (#71) (b605ed2)