feat(task): enforce a per-run cost budget for agents and sub-agents#3703
Conversation
The iteration cap bounds step count but not spend, so a verbose-but-productive run (or a fan-out of sub-agents) could run away on cost. Add an optional max_budget_per_run (USD) on LocalConversation, enforced in the run loop next to the iteration cap (preserving FINISHED), and wire it through AgentDefinition.max_budget_per_run + TaskManager so spawned sub-agents inherit or override a budget. Also surface a sub-agent run that ends in ERROR (budget or iteration cap) to the parent task instead of reporting an empty 'completed' result.
Python API breakage checks — ✅ PASSEDResult: ✅ PASSED |
REST API breakage checks (OpenAPI) — ✅ PASSEDResult: ✅ PASSED |
Coverage Report •
|
||||||||||||||||||||||||||||||||||||||||
all-hands-bot
left a comment
There was a problem hiding this comment.
⚠️ QA Report: PASS WITH ISSUES
Functional QA passed: real SDK runs showed per-run budget enforcement for local conversations and sub-agents, with no functional regressions found; one non-functional CI check is failing for the human-only PR description field.
Does this PR achieve its stated goal?
Yes. The PR set out to enforce max_budget_per_run for agents and sub-agents, wire it through file-based/sub-agent definitions, and surface budget-limited sub-agent runs as task errors. I exercised those paths with real SDK conversations using the configured LLM proxy: on main, a tiny budget was ignored and the same workflows finished successfully; on this PR, the same workflows stopped with MaxBudgetReached, and TaskManager.start_task returned an errored task with the budget message. A finish-only conversation that spent more than the tiny budget still remained finished, matching the PR’s stated preservation behavior.
| Phase | Result |
|---|---|
| Environment Setup | ✅ uv sync --dev completed earlier; no tests/linters run locally. |
| CI Status | PR Description Check / Validate PR description failing and this QA job still in progress. |
| Functional Verification | ✅ Local conversation budget halt, sub-agent budget inheritance/error surfacing, file-agent budget parsing, finish preservation, and deterministic custom-tool smoke all behaved as expected. |
Functional Verification
Test 1: Local conversation budget enforcement with a real terminal task
Step 1 — Reproduce / establish baseline without the fix:
On main (6bf874e7), ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_real_budget.py:
status finished
accumulated_cost 0.04793500
error_codes []
error_details []
event_types ['SystemPromptEvent', 'MessageEvent', 'ActionEvent', 'ObservationEvent', 'MessageEvent']
This confirms the old behavior: even with max_budget_per_run=0.000001, the real LLM+terminal run spent above the budget and still finished, with no budget error.
Step 2 — Apply the PR's changes:
Checked out alona/sdk-subagent-budget at 029f0cc806075773b32e3e1bc10dfd398aee1b5e.
Step 3 — Re-run with the fix in place:
Ran the same command on the PR branch:
status error
accumulated_cost 0.02418500
error_codes ['MaxBudgetReached']
error_details ['Agent reached maximum budget limit ($0.0000); accumulated cost $0.0242.']
event_types ['SystemPromptEvent', 'MessageEvent', 'ActionEvent', 'ObservationEvent', 'ConversationErrorEvent']
This shows the run now halts after the first real tool-using step once accumulated cost exceeds the run budget.
Test 2: Sub-agent budget inheritance and task error surfacing
Step 1 — Reproduce / establish baseline without the fix:
On main, ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_task_budget.py:
task_status completed
task_error None
task_result Command ran successfully: `qa-subagent-budget`
sub_status finished
sub_budget <missing>
sub_cost 0.01108100
sub_errors []
This confirms the previous sub-agent behavior: the parent’s budget argument was not present on the sub-conversation, and TaskManager.start_task reported completion despite spend above the tiny budget.
Step 2 — Apply the PR's changes:
Checked out the PR commit again.
Step 3 — Re-run with the fix in place:
Ran the same command on the PR branch:
task_status error
task_error Agent reached maximum budget limit ($0.0000); accumulated cost $0.0242.
task_result None
sub_status error
sub_budget 1e-06
sub_cost 0.02422000
sub_errors [('MaxBudgetReached', 'Agent reached maximum budget limit ($0.0000); accumulated cost $0.0242.')]
This verifies the budget is inherited by the spawned sub-conversation and the parent-facing Task now surfaces the budget stop as an error instead of an empty success.
Test 3: File-based sub-agent budget frontmatter
Step 1 — Reproduce / establish baseline without the fix:
On main, ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_definition_budget.py:
name budgeted
max_budget_attr <missing>
metadata_has_budget True
metadata_budget 0.123
This shows max_budget_per_run was previously treated as unstructured metadata, not as a typed agent definition field.
Step 2 — Apply the PR's changes:
Checked out the PR commit again.
Step 3 — Re-run with the fix in place:
Ran the same command on the PR branch:
name budgeted
max_budget_attr 0.123
metadata_has_budget False
metadata_budget None
This verifies a real file-agent definition now exposes the budget as a typed value and no longer leaves it in generic metadata.
Test 4: FINISHED status is preserved when the agent completes over budget
On the PR branch, ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_finished_budget.py with a no-tool prompt and max_budget_per_run=0.000001:
status finished
accumulated_cost 0.01843500
error_codes []
message_events ['user', 'agent']
This confirms the PR’s stated behavior that an agent which completes on the step is not converted to an error solely because that final LLM call puts cost above the budget.
Test 5: Existing deterministic custom tool execution still works
On the PR branch, ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_smoke_tool.py, a deterministic TestLLM conversation with a registered ClientTool:
status ConversationExecutionStatus.FINISHED
final_events 5
errors []
This smoke-checks the changed run loop with a custom tool path and no budget configured, confirming the default None budget does not break normal deterministic SDK execution.
Issues Found
⚠️ Non-functional CI issue:PR Description Check / Validate PR descriptionis failing because the human-only PR description section is not completed. I did not edit it; a human contributor needs to update theHUMAN:note and checkbox in their own words.- Functional QA issues: None.
Verdict: PASS WITH ISSUES.
This QA review was created by an AI agent (OpenHands) on behalf of the user.
|
✅ Review complete. This review was performed through OpenHands Cloud Automation. You can log in and view the conversation here. |
all-hands-bot
left a comment
There was a problem hiding this comment.
Code Review
🟢 Good taste — Elegant, simple solution that follows existing patterns in the codebase.
Summary
This PR adds a per-run cost budget feature for agents and sub-agents. The implementation:
- Adds
max_budget_per_runparameter toLocalConversation - Checks accumulated cost against the budget at each iteration step
- Preserves
FINISHEDstatus if the agent completes before budget check triggers - Propagates budgets from agent definitions to sub-conversations
- Surfaces run-limit errors from sub-agents to parent tasks
Analysis
[local_conversation.py]
- Budget check is placed correctly before the iteration check — this is the right priority order
- The guard
self._state.execution_status != ConversationExecutionStatus.FINISHEDcorrectly prevents overriding a successful finish _budget_exceeded_detail()and_emit_run_limit_error()are clean helper methods
[schema.py]
_extract_max_budget_per_run()handles bool/int/float/string types appropriately- Field is excluded from metadata via
_METADATA_FIELDS(good)
[manager.py]
- Budget inheritance chain is correct: definition value → parent value → None
- Error surfacing in
_run_task()now properly distinguishes ERROR vs successful completion _run_error_detail()extracts the last error event for parent visibility
[Tests]
test_execution_status_error_on_max_budget: Pre-seeds spend to bypass TestLLM cost limitation — pragmatic approach ✅test_finished_preserved_even_when_over_budget: Correctly verifies the FINISHED-over-budget edge case- Task manager tests cover both definition-sourced and inherited budgets
No Issues Found
- No breaking changes to existing APIs
- No type safety issues
- No complexity concerns
- No security concerns
[RISK ASSESSMENT]
⚠️ Risk Assessment: 🟢 LOW
This is a feature addition with no impact on existing behavior. The budget only applies when explicitly set and defaults toNone(disabled).
VERDICT:
✅ Worth merging — Clean implementation of a useful cost-control feature.
KEY INSIGHT:
The design correctly handles the edge case where an agent finishes successfully after accumulating costs — the FINISHED status is preserved rather than being overwritten by the budget error.
This review was generated by an AI agent (OpenHands) on behalf of the user through OpenHands Automation. View conversation
all-hands-bot
left a comment
There was a problem hiding this comment.
✅ QA Report: PASS
Verified the per-run budget feature through real SDK conversations, TaskManager sub-agent execution, file-based agent loading, and one real-LLM tool run; the PR achieves its stated goal.
Does this PR achieve its stated goal?
Yes. The PR set out to enforce a per-run USD budget for agents and sub-agents, wire it through file-based/registered sub-agents, and surface sub-agent budget/iteration-limit failures to the parent task. I exercised those paths directly: over-budget local runs now stop with MaxBudgetReached, finished runs remain finished, file-based max_budget_per_run loads as a first-class field, sub-agents inherit/override budgets, and sub-agent run-limit failures now return task_status: "error" with the budget detail instead of an empty completed result.
| Phase | Result |
|---|---|
| Environment Setup | ✅ Project bootstrap/dependency sync completed via the repo make build / uv sync --dev flow. |
| CI Status | ⏳ gh pr checks: 35 successful, 2 skipped, 1 pending (QA Changes by OpenHands/qa-changes); 0 failing. |
| Functional Verification | ✅ Exercised SDK conversation, file-based agent definition, TaskManager sub-agent, inherited budget, and real LLM spend paths. |
Functional Verification
Test 1: LocalConversation stops on per-run cost budget
Step 1 — Establish baseline without the fix:
Ran git checkout --detach origin/main && OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_budget_run.py:
{
"accepted_budget_arg": true,
"has_max_budget_attr": false,
"status": "error",
"step_calls": [1, 2, 3],
"error_codes": ["MaxIterationsReached"],
"spent": 5.0
}This shows the old SDK swallowed the unknown max_budget_per_run argument but did not enforce it; the run continued until the iteration cap.
Step 2 — Apply the PR changes:
Checked out 029f0cc806075773b32e3e1bc10dfd398aee1b5e.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_budget_run.py:
{
"has_max_budget_attr": true,
"max_budget_attr": 1.0,
"status": "error",
"step_calls": [1],
"error_codes": ["MaxBudgetReached"],
"error_details": ["Agent reached maximum budget limit ($1.0000); accumulated cost $5.0000."],
"spent": 5.0
}This confirms the budget is now recognized and stops the run before the iteration cap.
Test 2: Finished runs stay finished even when accumulated cost is above budget
Step 1 — Establish baseline:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_budget_finished.py on origin/main:
{
"status": "finished",
"step_calls": [1],
"error_codes": [],
"spent": 5.0
}The old behavior finished because no budget was enforced.
Step 2 — Apply the PR changes:
Checked out 029f0cc806075773b32e3e1bc10dfd398aee1b5e.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_budget_finished.py:
{
"status": "finished",
"step_calls": [1],
"error_codes": [],
"spent": 5.0
}This confirms the PR preserves FINISHED when the agent completes on the final/over-budget step, matching the PR description.
Test 3: File-based sub-agent definitions load max_budget_per_run
Step 1 — Establish baseline without the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_definition_budget.py on origin/main:
{
"has_budget_attr": false,
"budget": null,
"metadata": {
"max_budget_per_run": "2.5",
"custom_note": "keep-me"
}
}This shows max_budget_per_run was only opaque metadata before the PR.
Step 2 — Apply the PR changes:
Checked out 029f0cc806075773b32e3e1bc10dfd398aee1b5e.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_agent_definition_budget.py:
{
"has_budget_attr": true,
"budget": 2.5,
"metadata": {
"custom_note": "keep-me"
}
}This confirms file-based sub-agent frontmatter now exposes the budget field and does not leave it duplicated in metadata.
Test 4: TaskManager enforces sub-agent override budget and surfaces failure
Step 1 — Establish baseline without the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_subagent_budget.py on origin/main:
{
"definition_has_budget": false,
"parent_has_budget": false,
"subagent_budget": null,
"subagent_status": "error",
"subagent_step_calls": [1, 2],
"task_status": "completed",
"task_result": "",
"task_error": null
}This reproduces the gap described in the PR: the sub-agent hit a run-limit error, but the parent task saw an empty completed result.
Step 2 — Apply the PR changes:
Checked out 029f0cc806075773b32e3e1bc10dfd398aee1b5e.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_subagent_budget.py:
{
"definition_has_budget": true,
"definition_budget": 1.0,
"parent_budget": 7.0,
"subagent_budget": 1.0,
"subagent_status": "error",
"subagent_step_calls": [1],
"task_status": "error",
"task_error": "Agent reached maximum budget limit ($1.0000); accumulated cost $5.0000."
}This confirms a sub-agent definition budget overrides the parent budget and the parent task now receives the budget error.
Test 5: TaskManager inherits parent budget when definition has no override
Step 1 — Establish baseline without the fix:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_subagent_inherit_budget.py on origin/main:
{
"parent_has_budget": false,
"definition_budget": null,
"subagent_budget": null,
"subagent_status": "error",
"subagent_step_calls": [1, 2],
"task_status": "completed",
"task_result": "",
"task_error": null
}The parent budget was not a real conversation setting, so nothing was inherited.
Step 2 — Apply the PR changes:
Checked out 029f0cc806075773b32e3e1bc10dfd398aee1b5e.
Step 3 — Re-run with the fix in place:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_subagent_inherit_budget.py:
{
"parent_has_budget": true,
"parent_budget": 1.0,
"definition_budget": null,
"subagent_budget": 1.0,
"subagent_status": "error",
"subagent_step_calls": [1],
"task_status": "error",
"task_error": "Agent reached maximum budget limit ($1.0000); accumulated cost $5.0000."
}This confirms spawned sub-agents inherit the parent's budget when the definition does not override it.
Test 6: Real LLM spend triggers the budget halt naturally
Step 1 — Establish baseline:
The deterministic baseline above established that pre-PR conversations had no budget field/enforcement. I then exercised the PR with an actual LLM call and terminal tool action to ensure real accumulated cost, not only synthetic metrics, drives the halt.
Step 2 — Apply the PR changes:
Checked out 029f0cc806075773b32e3e1bc10dfd398aee1b5e with LLM_MODEL, LLM_BASE_URL, and LLM_API_KEY set.
Step 3 — Run with real LLM/tool execution:
Ran OPENHANDS_SUPPRESS_BANNER=1 uv run python /tmp/qa_real_llm_budget.py:
{
"status": "error",
"error_codes": ["MaxBudgetReached"],
"error_details": ["Agent reached maximum budget limit ($0.0000); accumulated cost $0.0086."],
"spent": 0.008635,
"event_types": ["SystemPromptEvent", "MessageEvent", "ActionEvent", "ObservationEvent", "ConversationErrorEvent"]
}This confirms a real user-style SDK conversation with an LLM and terminal tool call accumulates spend and halts on the new budget ceiling.
Issues Found
None.
Verdict: PASS
This review was created by an AI agent (OpenHands) on behalf of the user.
VascoSch92
left a comment
There was a problem hiding this comment.
@ak684 this should be evaluate before being merged.
I think this doesn't solve the problem as we are not saying the subagent that has no budget left.
Moreover, this should just be enforced in the main convo and not just for subagent. Then the subagent convo ihnerit this budget. Because we have different now different behaviour for main convo and subagent convo
|
There are a couple of problems with the implementation
Net effect: a top-level/parent agent budget can only be set by instantiating LocalConversation directly or via sub-agent frontmatter, not via Conversation(...). Given the symmetry with max_iteration_per_run, this looks unintended.
_run_task (manager.py:361) only special-cases ERROR; every other terminal status falls through to the success branch (set_result(get_agent_final_response(...))). Sub-agents have stuck detection on by default, so a stuck sub-agent ends STUCK, get_agent_final_response returns "", and the parent is told the task completed: This is the same "empty success" failure the commit's own comment says it is fixing — it just isn't caught for the stuck path (and, by inspection of the same branch, any non-ERROR terminal status such as PAUSED).
_budget_exceeded_detail keys off get_combined_metrics().accumulated_cost. For any LLM where litellm has no pricing (custom/proxy/self-hosted models) this stays 0.0, so the cap never triggers — unlike the iteration cap, which is always effective:
Reproduction script import logging, tempfile
from collections.abc import Sequence
from typing import ClassVar
logging.disable(logging.CRITICAL) # silence SDK log noise
from openhands.sdk import Agent
from openhands.sdk.conversation import Conversation
from openhands.sdk.conversation.impl.local_conversation import LocalConversation
from openhands.sdk.conversation.state import ConversationExecutionStatus as Status
from openhands.sdk.event.conversation_error import ConversationErrorEvent
from openhands.sdk.llm import ImageContent, Message, MessageToolCall, TextContent
from openhands.sdk.llm.utils.metrics import Metrics
from openhands.sdk.testing import TestLLM
from openhands.sdk.tool import (
Action, Observation, Tool, ToolDefinition, ToolExecutor, register_tool,
)
from openhands.tools.task.manager import Task, TaskManager, TaskStatus
# --- a tool that always returns the same observation (to drive a stuck loop) ---
class A(Action):
command: str
class O(Observation):
result: str
@property
def to_llm_content(self) -> Sequence[TextContent | ImageContent]:
return [TextContent(text=self.result)]
class Exec(ToolExecutor[A, O]):
def __call__(self, action: A, conversation=None) -> O:
return O(result="same-observation")
class LoopTool(ToolDefinition[A, O]):
name: ClassVar[str] = "test_tool"
@classmethod
def create(cls, conv_state=None, *, executor, **p):
return [cls(description="t", action_type=A, observation_type=O, executor=executor)]
register_tool("test_tool", LoopTool.create(executor=Exec())[0])
def user(text): return Message(role="user", content=[TextContent(text=text)])
def call(): # one identical tool call
return Message(role="assistant", content=[TextContent(text="")],
tool_calls=[MessageToolCall(id="c", name="test_tool",
arguments='{"command":"x"}', origin="completion")])
def err_codes(events):
return [e.code for e in events if isinstance(e, ConversationErrorEvent)]
# ============================================================ BUG 1
with tempfile.TemporaryDirectory() as d:
agent = Agent(llm=TestLLM.from_messages([]), tools=[])
raised = None
try:
Conversation(agent=agent, workspace=d, max_budget_per_run=2.5, visualizer=None)
except TypeError as e:
raised = str(e)
assert raised is not None, "NOT REPRODUCED: public factory accepted the kwarg"
assert "max_budget_per_run" in raised
# control: the underlying class *does* accept it
lc = LocalConversation(agent=Agent(llm=TestLLM.from_messages([]), tools=[]),
workspace=d, max_budget_per_run=2.5, visualizer=None,
delete_on_close=False)
assert lc.max_budget_per_run == 2.5
print("BUG 1 REPRODUCED: public Conversation(...) rejects max_budget_per_run")
print(f" TypeError: {raised}")
# ============================================================ BUG 2
with tempfile.TemporaryDirectory() as d:
conv = LocalConversation(
agent=Agent(llm=TestLLM.from_messages([call() for _ in range(10)]),
tools=[Tool(name="test_tool")]),
workspace=d, visualizer=None, delete_on_close=False,
stuck_detection=False, max_iteration_per_run=3, max_budget_per_run=0.0001)
conv.send_message(user("go"))
conv.run()
spent = conv.conversation_stats.get_combined_metrics().accumulated_cost
codes = err_codes(conv.state.events)
assert spent == 0.0, f"NOT REPRODUCED: cost was tracked ({spent})"
assert conv._budget_exceeded_detail() is None, "NOT REPRODUCED: budget fired"
assert "MaxBudgetReached" not in codes and "MaxIterationsReached" in codes, codes
# positive control: when cost IS present, the same check DOES fire ->
# proving the no-op is caused purely by accumulated_cost staying 0.
conv.conversation_stats.usage_to_metrics["seed"] = Metrics(accumulated_cost=9.0)
assert conv._budget_exceeded_detail() is not None, "control failed: budget should fire at $9"
print("BUG 2 REPRODUCED: $0.0001 budget never fired (cost stayed 0.0); "
f"only stopped by {codes}")
print(" control: after seeding cost=$9, _budget_exceeded_detail() now fires")
# ============================================================ BUG 3
with tempfile.TemporaryDirectory() as d:
# control: this scenario genuinely ends STUCK on its own
ctrl = LocalConversation(
agent=Agent(llm=TestLLM.from_messages([call() for _ in range(12)]),
tools=[Tool(name="test_tool")]),
workspace=d, visualizer=None, delete_on_close=False, max_iteration_per_run=30)
ctrl.send_message(user("loop")); ctrl.run()
assert ctrl.state.execution_status == Status.STUCK, ctrl.state.execution_status
# same scenario through the patched TaskManager._run_task path
parent = LocalConversation(agent=Agent(llm=TestLLM.from_messages([]), tools=[]),
workspace=d, visualizer=None, delete_on_close=False)
mgr = TaskManager(); mgr._ensure_parent(parent)
sub = LocalConversation(
agent=Agent(llm=TestLLM.from_messages([call() for _ in range(12)]),
tools=[Tool(name="test_tool")]),
workspace=d, visualizer=None, delete_on_close=False, max_iteration_per_run=30)
task = Task(id="task_00000001", conversation_id=sub.id, conversation=sub,
status=TaskStatus.RUNNING)
mgr._tasks[task.id] = task
done = mgr._run_task(task, "loop")
assert done.status == TaskStatus.COMPLETED, f"NOT REPRODUCED: status={done.status}"
assert done.error is None, f"NOT REPRODUCED: error={done.error}"
assert done.result == "", f"NOT REPRODUCED: result={done.result!r}"
print("BUG 3 REPRODUCED: a sub-agent that ends STUCK is reported to the parent as "
f"status={done.status.value}, error=None, result='' "
"(parent sees 'Task completed with no result.')")
print("\nAll three assertions held -> all three issues reproduced on this commit.") |
HUMAN:
Agents and sub-agents only had an iteration cap, not a cost ceiling — this adds a per-run budget so a verbose run or a sub-agent fan-out can't run away on spend. Reviewed the change and the deterministic + real-LLM tests, including a spawned sub-agent halting on its budget.
AGENT:
Why
A run was bounded only by the iteration cap (
max_iteration_per_run), which limits step count but not spend. A verbose-but-productive agent, or a fan-out of sub-agents, could run away on cost with no hard ceiling. (Metrics.max_budget_per_taskexisted but was dead — stored/merged, never enforced.)Summary
max_budget_per_run(USD) onLocalConversation, enforced in the run loop next to the iteration cap (run()+arun()), preserving aFINISHEDstatus set on the final step.AgentDefinition.max_budget_per_run+TaskManagerso spawned sub-agents inherit the parent's budget or override it from their definition.ERROR(budget or iteration cap) to the parent task, instead of reporting an empty "completed" result.How to Test
Deterministic (no API):
Real-LLM (proxy; cost a few cents), proving spend triggers the halt naturally:
AgentDefinition(max_budget_per_run=$0.05)ran real terminal steps; real
accumulated_costreached $0.0529 and the runhalted:
MaxBudgetReached→ statusERROR.TaskManager.start_taskpath, theparent
Taskreported:error/"Agent reached maximum budget limit ($0.0500); accumulated cost $0.0524."(previously this was a silent empty "completed").
Type
Notes
max_iteration_per_run) and is checked againstconversation_stats.get_combined_metrics().accumulated_cost(total across the agent + condenser LLMs). The dormant per-LLMMetrics.max_budget_per_taskis left untouched — aggregating it across a run's LLMs is awkward, and a run-level limit is the natural home.max_iteration_per_runnow surfaces as a task error too. Happy to scope this to budget-only if preferred.max_budget_per_runis appended as the last named__init__param (before**_) so no existing positional argument shifts; defaultNone= no budget (no behavior change for existing runs). NoLocalConversationis constructed positionally anywhere in the SDK/agent-server.Agent Server images for this PR
• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server
Variants & Base Images
eclipse-temurin:17-jdknikolaik/python-nodejs:python3.13-nodejs22-slimgolang:1.21-bookwormPull (multi-arch manifest)
# Each variant is a multi-arch manifest supporting both amd64 and arm64 docker pull ghcr.io/openhands/agent-server:029f0cc-pythonRun
All tags pushed for this build
About Multi-Architecture Support
029f0cc-python) is a multi-arch manifest supporting both amd64 and arm64029f0cc-python-amd64) are also available if needed