What's Changed
This release contains breaking changes, so please be careful. Previously, due to a minor bug, all benchmarks contained duplicate tasks. In the new release, all benchmarks have been regenerated without duplicates. However, this means that the results of runs on the previous versions are now unlikely to be reproduced exactly. Also, removed trivial-1m benchmark due to the limits of the task generator, now it contains only 21k unique tasks and renamed to trivial-21k.
Project also migrated to uv, which should improve user experience overall. Enjoy! And let me know, if something breaks.
- Fix bug in EnvParams.max_steps type by @DrJessop in #39
- Fix #42 by @jugheadjones10 in #43
- enable bf16 on eval by @jugheadjones10 in #46
- fixed generator bug and re-sampled benchmarks without duplicates by @Howuhh in #50
- Add tqdm dependency to pyproject.toml by @adzcai in #52
- migrate to uv by @Howuhh in #54
New Contributors
- @DrJessop made their first contribution in #39
- @jugheadjones10 made their first contribution in #43
- @adzcai made their first contribution in #52
Full Changelog: v0.9.1...v0.9.2