You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+12Lines changed: 12 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -86,6 +86,10 @@ If you are new to the project, the recommended reading order is:
86
86
87
87
## 📰 News
88
88
89
+
🚩 **Update** (2026-05-22) Added a real API smoke test that runs the five SGI benchmark README server commands and OpenAI SDK examples, then validates each expected final-answer format.
90
+
91
+
🚩 **Update** (2026-05-22) CLI and API deployments can now expose an explicit complete tool set with repeatable `--tool NAME`, useful when a run needs a smaller or benchmark-specific tool surface.
92
+
89
93
🚩 **Update** (2026-05-21) ResearchHarness is packaged for one-command installation with `pip install researchharness`. The existing source-tree commands remain compatible, and releases can publish to PyPI automatically from GitHub Releases.
90
94
91
95
🚩 **Update** (2026-05-21) The Python import API now exposes the same core runtime controls as CLI mode: default workspace, role prompt strings/files, image inputs, explicit tool sets, optional extra tools, and decorated custom function tools.
@@ -736,6 +740,10 @@ deployment, and QA/VQA benchmark deployment. Advanced users can still combine
736
740
`--role-prompt-file`, `--input-wrapper`, and `--output-wrapper` manually when a
737
741
custom application needs only part of the benchmark behavior.
738
742
743
+
For benchmark deployments that need a smaller tool surface, pass repeatable
744
+
`--tool NAME` flags. This defines the complete exposed tool set for each run
745
+
and cannot be combined with `--extra-tool`.
746
+
739
747
### API Concurrency
740
748
741
749
The API endpoint remains synchronous from the client's perspective, but long
@@ -1064,6 +1072,9 @@ repository for local images.
1064
1072
1065
1073
More detailed tool documentation lives in [agent_base/tools/README.md](agent_base/tools/README.md).
1066
1074
1075
+
Tool-use requests should use the native tool calling interface. User-required
1076
+
final answer formats remain ordinary final-answer text.
1077
+
1067
1078
Tool calls follow a single-request contract: `WebSearch.query`,
1068
1079
`ScholarSearch.query`, and `WebFetch.url` each accept one string, not a list.
1069
1080
When the model needs multiple independent searches, page fetches, file reads,
Copy file name to clipboardExpand all lines: agent_base/prompts/system_base.md
+2-2Lines changed: 2 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -66,7 +66,8 @@ You are a capable all-purpose AI assistant. You do far more than simple question
66
66
67
67
## Native Tool Calling Contract
68
68
69
-
- Use the API's native tool calling interface when tools are needed. Do not write pseudo-XML, pseudo-tool JSON, or tag-based tool requests in plain text.
69
+
- Use the API's native tool calling interface when tools are needed.
70
+
- If the user explicitly requires a special final-answer format, follow that format as ordinary answer text.
70
71
- If a turn includes native tool calls, that turn is a tool-use turn. Any accompanying text is treated as working context, not as the final result.
71
72
- Multiple tool calls in one turn are allowed only when they are independent.
72
73
- If tool B depends on the output of tool A, do not request them in the same turn. Wait for tool A's result first.
@@ -75,7 +76,6 @@ You are a capable all-purpose AI assistant. You do far more than simple question
75
76
- Keep tool turns structured. Brief text may explain the current tool step, but the tool call itself is the action.
76
77
- When no more tools are needed, return the final result as plain text.
77
78
- If the user requires a strict format such as JSON, output only that payload as the plain final result text.
78
-
- Do not emit legacy protocol tags such as `<tool_call>`, `<tool_response>`, `<think>`, or `<answer>`.
0 commit comments