Skip to content

Latest commit

 

History

History
77 lines (59 loc) · 3.22 KB

File metadata and controls

77 lines (59 loc) · 3.22 KB

Astabench Tools

This module contains a variety of tools that can be provided to astabench agents.

Submission tools

The submission module provides a mechanism for keeping track of tool-based answer submissions. It does not directly update the state.output.completion variable that the scorer uses, but allows tools or sub-agents to track a planned submission through a shared store. All Astabench tools that provide a "submit" option use this mechanism to store submissions. The get_submission_manager() returns a SubmissionManager instance for the current Sample that provides the following methods:

  • write_submission sets the current candidate submission.
  • has_submission checks if a submission has been made via write_submission.
  • get_submission retrieves the current submission.
  • clear_submission clears the current submission.

Several tools use this submission mechanism:

  • submit_tool provided by the submission module
  • report_editor (if enable_submission=True is passed to the constructor)
  • table_editor (if enable_submission=True is passed to the constructor)

So, for example, if an agent calls report_editor(action="submit"), the content of the report can then be retrieved via get_submission_manager().get_submission(), and this value can be assigned to state.output.completion for scoring.

Example agent using submit_tool:

from inspect_ai.model import ChatMessageUser, call_tools, get_model
from inspect_ai.solver import Generate, TaskState, solver

from astabench.tools.submission import get_submission_manager, submit_tool


@solver
def my_agent():
    async def step_agent(state: TaskState, generate: Generate) -> TaskState:
        # Just call one step of the llm and run tools
        logger.info("Running agent step with messages: %s", state.messages)
        model_output = await get_model().generate(
            input=state.messages,
            tools=state.tools,
        )
        state.messages.append(model_output.message)

        if model_output.message.tool_calls:
            tool_results = await call_tools(
                model_output.message, state.tools, max_output=None
            )
            state.messages.extend(tool_results)

        return state

    def check_submission(answer: str) -> bool:
        # example placeholder
        return True

    async def solve(state: TaskState, generate: Generate) -> TaskState:
        state.tools.append(submit_tool())
        while True:
            await step_agent(state, generate)

            # This becomes True when the agent calls the `submit_tool`
            manager = get_submission_manager()
            if manager.has_submission():
                answer = manager.get_submission()

                # Optionally, do some extra checking here
                if not manager.check_submission(answer):
                    manager.clear_submission()

                # Store the submission in the state for scoring
                state.output.completion = answer
                return state
            else:
                state.messages.append(
                    ChatMessageUser(
                        content="Continue working on the task and submit when you are done."
                    )
                )

    return solve