Astabench Tools

This module contains a variety of tools that can be provided to astabench agents.

Submission tools

The submission module provides a mechanism for keeping track of tool-based answer submissions. It does not directly update the state.output.completion variable that the scorer uses, but allows tools or sub-agents to track a planned submission through a shared store. All Astabench tools that provide a "submit" option use this mechanism to store submissions. The get_submission_manager() returns a SubmissionManager instance for the current Sample that provides the following methods:

write_submission sets the current candidate submission.
has_submission checks if a submission has been made via write_submission.
get_submission retrieves the current submission.
clear_submission clears the current submission.

Several tools use this submission mechanism:

submit_tool provided by the submission module
report_editor (if enable_submission=True is passed to the constructor)
table_editor (if enable_submission=True is passed to the constructor)

So, for example, if an agent calls report_editor(action="submit"), the content of the report can then be retrieved via get_submission_manager().get_submission(), and this value can be assigned to state.output.completion for scoring.

Example agent using submit_tool:

from inspect_ai.model import ChatMessageUser, call_tools, get_model
from inspect_ai.solver import Generate, TaskState, solver

from astabench.tools.submission import get_submission_manager, submit_tool


@solver
def my_agent():
    async def step_agent(state: TaskState, generate: Generate) -> TaskState:
        # Just call one step of the llm and run tools
        logger.info("Running agent step with messages: %s", state.messages)
        model_output = await get_model().generate(
            input=state.messages,
            tools=state.tools,
        )
        state.messages.append(model_output.message)

        if model_output.message.tool_calls:
            tool_results = await call_tools(
                model_output.message, state.tools, max_output=None
            )
            state.messages.extend(tool_results)

        return state

    def check_submission(answer: str) -> bool:
        # example placeholder
        return True

    async def solve(state: TaskState, generate: Generate) -> TaskState:
        state.tools.append(submit_tool())
        while True:
            await step_agent(state, generate)

            # This becomes True when the agent calls the `submit_tool`
            manager = get_submission_manager()
            if manager.has_submission():
                answer = manager.get_submission()

                # Optionally, do some extra checking here
                if not manager.check_submission(answer):
                    manager.clear_submission()

                # Store the submission in the state for scoring
                state.output.completion = answer
                return state
            else:
                state.messages.append(
                    ChatMessageUser(
                        content="Continue working on the task and submit when you are done."
                    )
                )

    return solve

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Astabench Tools

Submission tools

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Astabench Tools

Submission tools