This module contains a variety of tools that can be provided to astabench agents.
The submission module provides a mechanism for keeping track of tool-based answer submissions. It does not directly update the state.output.completion variable that the scorer uses, but allows tools or sub-agents to track a planned submission through a shared store. All Astabench tools that provide a "submit" option use this mechanism to store submissions. The get_submission_manager() returns a SubmissionManager instance for the current Sample that provides the following methods:
write_submissionsets the current candidate submission.has_submissionchecks if a submission has been made viawrite_submission.get_submissionretrieves the current submission.clear_submissionclears the current submission.
Several tools use this submission mechanism:
submit_toolprovided by thesubmissionmodulereport_editor(ifenable_submission=Trueis passed to the constructor)table_editor(ifenable_submission=Trueis passed to the constructor)
So, for example, if an agent calls report_editor(action="submit"), the content of the report can then be retrieved via get_submission_manager().get_submission(), and this value can be assigned to state.output.completion for scoring.
Example agent using submit_tool:
from inspect_ai.model import ChatMessageUser, call_tools, get_model
from inspect_ai.solver import Generate, TaskState, solver
from astabench.tools.submission import get_submission_manager, submit_tool
@solver
def my_agent():
async def step_agent(state: TaskState, generate: Generate) -> TaskState:
# Just call one step of the llm and run tools
logger.info("Running agent step with messages: %s", state.messages)
model_output = await get_model().generate(
input=state.messages,
tools=state.tools,
)
state.messages.append(model_output.message)
if model_output.message.tool_calls:
tool_results = await call_tools(
model_output.message, state.tools, max_output=None
)
state.messages.extend(tool_results)
return state
def check_submission(answer: str) -> bool:
# example placeholder
return True
async def solve(state: TaskState, generate: Generate) -> TaskState:
state.tools.append(submit_tool())
while True:
await step_agent(state, generate)
# This becomes True when the agent calls the `submit_tool`
manager = get_submission_manager()
if manager.has_submission():
answer = manager.get_submission()
# Optionally, do some extra checking here
if not manager.check_submission(answer):
manager.clear_submission()
# Store the submission in the state for scoring
state.output.completion = answer
return state
else:
state.messages.append(
ChatMessageUser(
content="Continue working on the task and submit when you are done."
)
)
return solve