small changes#415
Conversation
v6's outstanding commits refactor the old eval subsystem (manager/context/ instrument) and defer import-time patching via activate_runtime(). This branch already replaced that subsystem (Sandbox/Taskset/Variant) and rewrote the agent base, so none of v6's changes apply cleanly or usefully here. Recorded as an "ours" merge: v6 is marked merged, branch tree is kept verbatim.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 7 potential issues.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
| self.max_output_tokens = config.max_output_tokens | ||
| self.thinking_level = config.thinking_level | ||
| self.include_thoughts = config.include_thoughts | ||
| self.excluded_predefined_functions = list(config.excluded_predefined_functions) |
There was a problem hiding this comment.
Config exclusions never applied
Medium Severity
GeminiConfig.excluded_predefined_functions is copied onto GeminiAgent but never passed when constructing GeminiComputerTool, so the tool always advertises the full predefined computer-use set regardless of user configuration.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
| scroll_x=sx, | ||
| scroll_y=sy, | ||
| ) | ||
| return await self.screenshot() |
There was a problem hiding this comment.
Scroll magnitude default shrunk
Medium Severity
Gemini scroll actions now default magnitude to 3 VNC wheel clicks instead of the previous default of 800, so omitted or typical magnitudes produce far less scrolling than before after the RFB refactor.
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
| ) | ||
| if sibling_docs: | ||
| return [tool_result_msg, BetaMessageParam(role="user", content=sibling_docs)] | ||
| return tool_result_msg |
There was a problem hiding this comment.
Citation docs split wrongly
Medium Severity
When citations are enabled, citation document blocks are returned as a separate user message instead of living in the same user turn as the matching tool_result, diverging from the prior Anthropic message shape.
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
| } | ||
| betas: list[str] | Omit = list(required_betas) if required_betas else Omit() | ||
| tool_choice = BetaToolChoiceAutoParam(type="auto", disable_parallel_tool_use=True) | ||
| tools = cast("list[BetaToolUnionParam]", list(state.params)) |
There was a problem hiding this comment.
Tool search defer dropped
Medium Severity
Large MCP tool catalogs no longer get defer_loading when ClaudeToolSearchTool is configured, because the threshold logic that marked generic function tools was removed from get_response.
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
|
|
||
| run_cmd = self._build_cli_command( | ||
| prompt=prompt, max_steps=max_steps, system_prompt=system_prompt, | ||
| mcp_config_path=mcp_config_path, |
There was a problem hiding this comment.
Prompt file never used
Medium Severity
The agent writes the task prompt to .hud_prompt.txt over SFTP but still passes the full prompt on the claude CLI command line, so long prompts remain subject to shell length and quoting limits the file was meant to avoid.
Additional Locations (1)
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
| response=response, | ||
| parts=parts or None, | ||
| ), | ||
| ), |
There was a problem hiding this comment.
Computer URL field dropped
Medium Severity
Gemini computer-use tool results no longer include the required url field (and related metadata) on FunctionResponse, because formatting was centralized without the browser-specific fields the old GeminiComputerTool.format_result added.
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.
| "GeminiGlobTool", | ||
| "GeminiListTool", | ||
| "GeminiMCPProxyTool", | ||
| "GeminiMemoryTool", |
There was a problem hiding this comment.
Exported missing memory tool
Low Severity
__all__ still lists GeminiMemoryTool after memory.py was deleted in the same refactor, so importing that name from hud.agents.gemini.tools fails despite being part of the public export list.
Reviewed by Cursor Bugbot for commit cc7bb2d. Configure here.


Note
High Risk
Large breaking public API and agent/eval orchestration changes across auth-adjacent gateway clients, remote SSH execution, and computer-control paths.
Overview
This is a breaking SDK reshape around environments, tasks, and rollouts. The README and top-level
hudexports now center onEnvironment+@env.task(),Variant/Taskset, andawait agent(run)with rewards onrun.trace, replacing the olderhud.eval()/EvalContext/env.scenariostory.Agents are rebuilt on a slim
AgentABC and a sharedToolAgentloop keyed off a liveRun.MCPAgent, lazy_runtimeactivation, andhud.trace()go away; patches and pretty errors load eagerly fromhud/__init__.py. Provider agents (Claude, Gemini, OpenAI) wire native tools to capability clients — SSH for shell/editor, RFB for computer use, MCP proxy tools for discovered env tools — instead of forwarding through generic env MCP tool handlers.New agent paths include optional
BrowserUseAgent(CDP viabrowser-use) andClaudeSDKAgent(remoteclaudeCLI over SSH, with a local computer-use MCP bridge for RFB).create_agentnow builds agents from config objects rather thanAgent.create(). Unit tests were added for Claude/Gemini computer tool dispatch.Reviewed by Cursor Bugbot for commit cc7bb2d. Bugbot is set up for automated code reviews on this repo. Configure here.