You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: CHANGELOG.md
+25Lines changed: 25 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -5,6 +5,31 @@ All notable changes to this project will be documented in this file.
5
5
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.1.0/),
6
6
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
7
7
8
+
## [0.0.4] - 2026-02-14
9
+
10
+
### Added
11
+
12
+
-**Token usage tracking** - `AgentResult` now reports `input_tokens`, `output_tokens`, `cache_read_tokens`, and `cache_write_tokens` throughout the pipeline
13
+
-**Fallback cost calculator** - New `pricing.py` module computes cost from token counts when litellm doesn't report it, with manual pricing table for custom endpoints
14
+
-**Log directory passthrough** - Runners now pass `log_dir` path to agents for downstream logging (PR #32)
15
+
-**Eval stats in summary** - Run summary JSON now includes pass rate and per-task eval results when auto-eval is enabled
16
+
-**Gold conflict checker** - New `scripts/check_gold_conflicts.py` to detect merge conflicts between gold patches across all tasks using parallel Modal sandboxes
17
+
-**Benchmark runner script** - New `scripts/run_benchmark.sh` for quick experiment launches
18
+
-**Model smoke test** - New `scripts/test_model.py` to verify models work via LiteLLM
19
+
20
+
### Changed
21
+
22
+
-**Improved cooperation prompt** - Replaced situational-awareness prompt with explicit numbered workflow (plan → coordinate → summarize) after A/B testing showed better coordination and lower cost
23
+
-**OpenHands SDK dependencies promoted to core** - Moved from `[openhands]` optional extra into base dependencies for simpler installation
24
+
-**HTTP retry logic** - Remote conversation requests now retry on 5xx errors with exponential backoff (via tenacity)
25
+
-**Patch extraction timing** - Patches are now extracted while the sandbox is still alive, before stats collection
26
+
27
+
### Fixed
28
+
29
+
-**Pricing calculation** - Fixed cost reporting for models where litellm returns zero cost
30
+
-**MaxIterationsReached handling** - Now caught inside the conversation loop instead of as an outer exception, preventing lost patches
31
+
-**Custom API base URLs** - `ANTHROPIC_BASE_URL` and `OPENAI_BASE_URL` now forwarded to sandboxes
0 commit comments