Skip to content

Commit f810137

Browse files
committed
Release v0.0.4
1 parent 285c09d commit f810137

File tree

2 files changed

+26
-1
lines changed

2 files changed

+26
-1
lines changed

CHANGELOG.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,31 @@ All notable changes to this project will be documented in this file.
55
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
66
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
77

8+
## [0.0.4] - 2026-02-14
9+
10+
### Added
11+
12+
- **Token usage tracking** - `AgentResult` now reports `input_tokens`, `output_tokens`, `cache_read_tokens`, and `cache_write_tokens` throughout the pipeline
13+
- **Fallback cost calculator** - New `pricing.py` module computes cost from token counts when litellm doesn't report it, with manual pricing table for custom endpoints
14+
- **Log directory passthrough** - Runners now pass `log_dir` path to agents for downstream logging (PR #32)
15+
- **Eval stats in summary** - Run summary JSON now includes pass rate and per-task eval results when auto-eval is enabled
16+
- **Gold conflict checker** - New `scripts/check_gold_conflicts.py` to detect merge conflicts between gold patches across all tasks using parallel Modal sandboxes
17+
- **Benchmark runner script** - New `scripts/run_benchmark.sh` for quick experiment launches
18+
- **Model smoke test** - New `scripts/test_model.py` to verify models work via LiteLLM
19+
20+
### Changed
21+
22+
- **Improved cooperation prompt** - Replaced situational-awareness prompt with explicit numbered workflow (plan → coordinate → summarize) after A/B testing showed better coordination and lower cost
23+
- **OpenHands SDK dependencies promoted to core** - Moved from `[openhands]` optional extra into base dependencies for simpler installation
24+
- **HTTP retry logic** - Remote conversation requests now retry on 5xx errors with exponential backoff (via tenacity)
25+
- **Patch extraction timing** - Patches are now extracted while the sandbox is still alive, before stats collection
26+
27+
### Fixed
28+
29+
- **Pricing calculation** - Fixed cost reporting for models where litellm returns zero cost
30+
- **MaxIterationsReached handling** - Now caught inside the conversation loop instead of as an outer exception, preventing lost patches
31+
- **Custom API base URLs** - `ANTHROPIC_BASE_URL` and `OPENAI_BASE_URL` now forwarded to sandboxes
32+
833
## [0.0.3] - 2026-02-04
934

1035
### Added

src/cooperbench/__about__.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,3 +1,3 @@
11
"""Version information for CooperBench."""
22

3-
__version__ = "0.0.3"
3+
__version__ = "0.0.4"

0 commit comments

Comments
 (0)