Skip to content

small changes#415

Merged
jdchawla29 merged 29 commits into
v6from
v6-agent-f-l
Jun 5, 2026
Merged

small changes#415
jdchawla29 merged 29 commits into
v6from
v6-agent-f-l

Conversation

@lorenss-m

@lorenss-m lorenss-m commented Jun 3, 2026

Copy link
Copy Markdown
Contributor

Note

High Risk
Large breaking public API and agent/eval orchestration changes across auth-adjacent gateway clients, remote SSH execution, and computer-control paths.

Overview
This is a breaking SDK reshape around environments, tasks, and rollouts. The README and top-level hud exports now center on Environment + @env.task(), Variant / Taskset, and await agent(run) with rewards on run.trace, replacing the older hud.eval() / EvalContext / env.scenario story.

Agents are rebuilt on a slim Agent ABC and a shared ToolAgent loop keyed off a live Run. MCPAgent, lazy _runtime activation, and hud.trace() go away; patches and pretty errors load eagerly from hud/__init__.py. Provider agents (Claude, Gemini, OpenAI) wire native tools to capability clients — SSH for shell/editor, RFB for computer use, MCP proxy tools for discovered env tools — instead of forwarding through generic env MCP tool handlers.

New agent paths include optional BrowserUseAgent (CDP via browser-use) and ClaudeSDKAgent (remote claude CLI over SSH, with a local computer-use MCP bridge for RFB). create_agent now builds agents from config objects rather than Agent.create(). Unit tests were added for Claude/Gemini computer tool dispatch.

Reviewed by Cursor Bugbot for commit cc7bb2d. Bugbot is set up for automated code reviews on this repo. Configure here.

@lorenss-m lorenss-m changed the title V6 agent f l small changes Jun 3, 2026
lorenss-m and others added 7 commits June 3, 2026 09:59
v6's outstanding commits refactor the old eval subsystem (manager/context/
instrument) and defer import-time patching via activate_runtime(). This branch
already replaced that subsystem (Sandbox/Taskset/Variant) and rewrote the agent
base, so none of v6's changes apply cleanly or usefully here. Recorded as an
"ours" merge: v6 is marked merged, branch tree is kept verbatim.
@jdchawla29 jdchawla29 marked this pull request as ready for review June 5, 2026 21:59
@jdchawla29 jdchawla29 merged commit 2a356e3 into v6 Jun 5, 2026
1 of 5 checks passed

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 7 potential issues.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

self.max_output_tokens = config.max_output_tokens
self.thinking_level = config.thinking_level
self.include_thoughts = config.include_thoughts
self.excluded_predefined_functions = list(config.excluded_predefined_functions)

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Config exclusions never applied

Medium Severity

GeminiConfig.excluded_predefined_functions is copied onto GeminiAgent but never passed when constructing GeminiComputerTool, so the tool always advertises the full predefined computer-use set regardless of user configuration.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

scroll_x=sx,
scroll_y=sy,
)
return await self.screenshot()

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Scroll magnitude default shrunk

Medium Severity

Gemini scroll actions now default magnitude to 3 VNC wheel clicks instead of the previous default of 800, so omitted or typical magnitudes produce far less scrolling than before after the RFB refactor.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

)
if sibling_docs:
return [tool_result_msg, BetaMessageParam(role="user", content=sibling_docs)]
return tool_result_msg

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Citation docs split wrongly

Medium Severity

When citations are enabled, citation document blocks are returned as a separate user message instead of living in the same user turn as the matching tool_result, diverging from the prior Anthropic message shape.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

}
betas: list[str] | Omit = list(required_betas) if required_betas else Omit()
tool_choice = BetaToolChoiceAutoParam(type="auto", disable_parallel_tool_use=True)
tools = cast("list[BetaToolUnionParam]", list(state.params))

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tool search defer dropped

Medium Severity

Large MCP tool catalogs no longer get defer_loading when ClaudeToolSearchTool is configured, because the threshold logic that marked generic function tools was removed from get_response.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.


run_cmd = self._build_cli_command(
prompt=prompt, max_steps=max_steps, system_prompt=system_prompt,
mcp_config_path=mcp_config_path,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Prompt file never used

Medium Severity

The agent writes the task prompt to .hud_prompt.txt over SFTP but still passes the full prompt on the claude CLI command line, so long prompts remain subject to shell length and quoting limits the file was meant to avoid.

Additional Locations (1)
Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

response=response,
parts=parts or None,
),
),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Computer URL field dropped

Medium Severity

Gemini computer-use tool results no longer include the required url field (and related metadata) on FunctionResponse, because formatting was centralized without the browser-specific fields the old GeminiComputerTool.format_result added.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

"GeminiGlobTool",
"GeminiListTool",
"GeminiMCPProxyTool",
"GeminiMemoryTool",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exported missing memory tool

Low Severity

__all__ still lists GeminiMemoryTool after memory.py was deleted in the same refactor, so importing that name from hud.agents.gemini.tools fails despite being part of the public export list.

Fix in Cursor Fix in Web

Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants