Skip to content

Python code execution tool (System/Venv/Docker) #1371

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 9 commits into
base: main
Choose a base branch
from
Draft

Conversation

marklysze
Copy link
Collaborator

@marklysze marklysze commented Mar 17, 2025

Why are these changes needed?

Tool for code execution that utilises an environment object to determine where execution takes place. This can be the local python environment (aka System), a virtual environment (aka Venv), or a Docker container.

This will replace the need for the code execution capabilities currently in ConversableAgent. They will be useful for building agents for things like code development, code testing, etc.

Related issue number

N/A

Checks

@marklysze marklysze added the enhancement New feature or request label Mar 17, 2025
@marklysze marklysze added this to ag2 Mar 17, 2025
@marklysze
Copy link
Collaborator Author

marklysze commented Mar 17, 2025

Sample code:

from autogen import ConversableAgent, register_function
from autogen.tools.experimental import PythonLocalExecutionTool

# Initialize the tool
python_executor = PythonLocalExecutionTool(
    use_venv=True,
    venv_path="/app/ag2/ms_code/.venv_ms_testing",  # Use an existing virtual environment (it will create one on each call otherwise, destroying it upon completion)
    timeout=60,
)

llm_config = {"model": "gpt-4o-mini", "api_type": "openai"}

# Create an agent that can use the tool
code_runner = ConversableAgent(
    name="code_runner",
    system_message="You are a code executor agent, when you don't execute code write the message 'TERMINATE' by itself.",
    llm_config=llm_config,
)

question_agent = ConversableAgent(
    name="question_agent",
    system_message="You are a developer AI agent. Send all your code suggestions to the python_executor tool where it will be executed and result returned to you. Keep refining the code until it works.",
    llm_config=llm_config,
)

register_function(
    python_executor,
    caller=question_agent,
    executor=code_runner,
    description="Run Python code",
)

result = code_runner.initiate_chat(
    recipient=question_agent,
    message="""
    Write a Python code with incorrect syntax. Always install numpy and pandas.
    """,
    max_turns=5,
)

@davorrunje davorrunje self-requested a review March 17, 2025 08:17
@davorrunje
Copy link
Collaborator

davorrunje commented Mar 17, 2025

We can use asyncer to support async file handling and remove the optional dependency to aiofiles. E.g.

from asyncer import asyncify

def read_file(name: str) -> str:
    with open(name) as f:
        return f.read()


async def a_read_file(name: str):
    content = await asyncify(read_file)(name=name)
    print(content)

@marklysze
Copy link
Collaborator Author

We can use asyncer to support async file handling and remove the optional dependency to aiofiles. E.g.

from asyncer import asyncify

def read_file(name: str) -> str:
    with open(name) as f:
        return f.read()


async def a_read_file(name: str):
    content = await asyncify(read_file)(name=name)
    print(content)

Great, thanks!

@davorrunje
Copy link
Collaborator

davorrunje commented Mar 17, 2025

I think we should use context managers to define things like python environment and working directories. They need to have lifecycle management and that should be made explicit, not hidden in the implementational details inside ConversableAgent.

This is how it could look like:

from autogen import ConversableAgent, LLMConfig
from autogen.environments import PythonEnvironment, WorkingDirectory
from autogen.tools.experimental import PythonLocalExecutionTool

# create virtual env and manage its lifecycle
with PythonEnvironment(python_version="3.11", dependancies=["numpy>2.1,<3", "pandas"]):

    # create working directories
    with WorkingDirectory.create_tmp() as wd1, WorkingDirectory.create_tmp() as wd2:
        # Initialize the tool
        python_executor = PythonLocalExecutionTool(
            use_venv=True,
            timeout=60,
            # the default working directory is from the outer scope (wd2), but you can change it explicitely
            working_directory=wd1,
            # it is using the python environment from the outer scope, but we could change it
           # python_environment=...
        )

        with LLMConfig(model="gpt-4o-mini", api_type= "openai"):

            # Create an agent that can use the tool
            code_runner = ConversableAgent(
                name="code_runner",
                system_message="You are a code executor agent, when you don't execute code write the message 'TERMINATE' by itself.",
                tools=python_executor,
            )

            question_agent = ConversableAgent(
                name="question_agent",
                system_message="You are a developer AI agent. Send all your code suggestions to the python_executor tool where it will be executed and result returned to you. Keep refining the code until it works.",
            )

        # this will be done automatically in the global run function
        python_executor.register_for_execution(question_agent)
       
        result = code_runner.initiate_chat(
            recipient=question_agent,
            message="""
            Write a Python code with incorrect syntax. Always install numpy and pandas.
            """,
            max_turns=5,
        )

@marklysze
Copy link
Collaborator Author

Updated, now have VenvPythonEnvironment and SystemPythonEnvironment for Python environments. And have WorkingDirectory for folder.

SystemPythonEnvironment example with context managers:

from autogen import ConversableAgent, LLMConfig, register_function

# Import the environment, working directory, and code execution tool
from autogen.environments import SystemPythonEnvironment, WorkingDirectory
from autogen.tools.experimental import PythonLocalExecutionTool

with SystemPythonEnvironment(executable="/usr/local/bin/python") as sys_py_env:
    with WorkingDirectory(path="/tmp/ag2_working_dir/") as wd:
        # Create our code execution tool, using the environment and working directory from the above context managers
        python_executor = PythonLocalExecutionTool(
            timeout=60,
            # If not using the context managers above, you can set the working directory and python environment here
            # working_directory=wd,
            # python_environment=sys_py_env,
        )

with LLMConfig(model="gpt-4o", api_type="openai"):

    # code_runner has the code execution tool available to execute
    code_runner = ConversableAgent(
        name="code_runner",
        system_message="You are a code executor agent, when you don't execute code write the message 'TERMINATE' by itself.",
        human_input_mode="NEVER",
    )

    # question_agent has the code execution tool available to its LLM
    question_agent = ConversableAgent(
        name="question_agent",
        system_message=("You are a developer AI agent. "
            "Send all your code suggestions to the python_executor tool where it will be executed and result returned to you. "
            "Keep refining the code until it works."
        ),
    )

# Register the python execution tool with the agents
register_function(
    python_executor,
    caller=question_agent,
    executor=code_runner,
    description="Run Python code",
)

result = code_runner.initiate_chat(
    recipient=question_agent,
    message=("Write Python code to print the current Python version followed by the numbers 1 to 11. "
             "Make a syntax error in the first version and fix it in the second version."
    ),
    max_turns=5,
)

print(f"Result: {result.summary}")

Venv environment example:

from autogen import ConversableAgent, LLMConfig, register_function

# Import the environment, working directory, and code execution tool
from autogen.environments import VenvPythonEnvironment, WorkingDirectory
from autogen.tools.experimental import PythonLocalExecutionTool

# Create a new virtual environment using a Python version
# Change this to match a version you have installed
venv = VenvPythonEnvironment(python_version="3.11")

# Create a temporary directory
working_dir = WorkingDirectory.create_tmp()

# Create our code execution tool
python_executor = PythonLocalExecutionTool(
    working_directory=working_dir,
    python_environment=venv,
)

with LLMConfig(model="gpt-4o", api_type="openai"):

    # code_runner has the code execution tool available to execute
    code_runner = ConversableAgent(
        name="code_runner",
        system_message="You are a code executor agent, when you don't execute code write the message 'TERMINATE' by itself.",
        human_input_mode="NEVER",
    )

    # question_agent has the code execution tool available to its LLM
    question_agent = ConversableAgent(
        name="question_agent",
        system_message=("You are a developer AI agent. "
            "Send all your code suggestions to the python_executor tool where it will be executed and result returned to you. "
            "Keep refining the code until it works."
        ),
    )

# Register the python execution tool with the agents
register_function(
    python_executor,
    caller=question_agent,
    executor=code_runner,
    description="Run Python code",
)

result = code_runner.initiate_chat(
    recipient=question_agent,
    message=("Write a Python program to write a poem to a file. "
             "Follow up with another program to read the poem from the file and print it."
    ),
    max_turns=5,
)

print(f"Result: {result.summary}")

@marklysze
Copy link
Collaborator Author

@davorrunje I've updated, incorporated environments. If you can have a look over please. Also missing are tests, would like some guidance on how to test this.

@marklysze marklysze changed the title Local Python code execution tool Python code execution tool (System/Venv/Docker) Mar 27, 2025
@marklysze
Copy link
Collaborator Author

marklysze commented Mar 27, 2025

DockerPythonEnvironment example:

from autogen import ConversableAgent, LLMConfig, register_function

# Import the environment, working directory, and code execution tool
from autogen.environments import DockerPythonEnvironment, WorkingDirectory
from autogen.tools.experimental import PythonCodeExecutionTool

with DockerPythonEnvironment(image="python:3.11-slim", pip_packages=["numpy", "pandas", "matplotlib"]) as docker_env:
    with WorkingDirectory(path="/tmp/ag2_working_dir/") as wd:
        # Create our code execution tool, using the environment and working directory from the above context managers
        python_executor = PythonCodeExecutionTool(
            timeout=60,
            # If not using the context managers above, you can set the working directory and python environment here
            # working_directory=wd,
            # python_environment=docker_env,
        )

    with LLMConfig(model="gpt-4o", api_type="openai"):
        # code_runner has the code execution tool available to execute
        code_runner = ConversableAgent(
            name="code_runner",
            system_message="You are a code executor agent, when you don't execute code write the message 'TERMINATE' by itself.",
            human_input_mode="NEVER",
        )

        # question_agent has the code execution tool available to its LLM
        question_agent = ConversableAgent(
            name="question_agent",
            system_message=(
                "You are a developer AI agent. "
                "Send all your code suggestions to the python_executor tool where it will be executed and result returned to you. "
                "Keep refining the code until it works."
            ),
        )

    # Register the python execution tool with the agents
    register_function(
        python_executor,
        caller=question_agent,
        executor=code_runner,
        description="Run Python code",
    )

    result = code_runner.initiate_chat(
        recipient=question_agent,
        message=(
            "Write Python code to print the current Python version followed by the numbers 1 to 11. "
            "Make a syntax error in the first version and fix it in the second version."
        ),
        max_turns=5,
    )

    print(f"Result: {result.summary}")

Output

2025-03-28 06:13:11,416 - INFO - Docker version: Docker version 27.5.1, build 9f9e405
2025-03-28 06:13:13,980 - INFO - Pulled Docker image: python:3.11-slim
2025-03-28 06:13:13,980 - INFO - Starting Docker container: ag2_docker_env_8e729dc8
2025-03-28 06:13:14,217 - INFO - Started Docker container: ag2_docker_env_8e729dc8 (e7691423b8cf1857f5b07adea2a4b479d9377b9400d350fa324037c3a03fc8dd)
2025-03-28 06:13:14,218 - INFO - Installing pip packages: numpy pandas matplotlib
2025-03-28 06:13:25,157 - INFO - Successfully installed pip packages
code_runner (to question_agent):

Write Python code to print the current Python version followed by the numbers 1 to 11. Make a syntax error in the first version and fix it in the second version.

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
question_agent (to code_runner):

***** Suggested tool call (call_GvddnvooTw8sONm9lsDUTiBJ): python_execute_code *****
Arguments: 
{"code_execution_request":{"code":"import platform\nprint('Python version:', platform.python_version())\nfor i in range(1, 12):\n  printi(i)","libraries":[]}}
************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION python_execute_code...
Call ID: call_GvddnvooTw8sONm9lsDUTiBJ
Input arguments: {'code_execution_request': {'code': "import platform\nprint('Python version:', platform.python_version())\nfor i in range(1, 12):\n  printi(i)", 'libraries': []}}
code_runner (to question_agent):

***** Response from calling tool (call_GvddnvooTw8sONm9lsDUTiBJ) *****
{'success': False, 'stdout': 'Python version: 3.11.11\n', 'stderr': 'Traceback (most recent call last):\n  File "/workspace/script.py", line 4, in <module>\n    printi(i)\n    ^^^^^^\nNameError: name \'printi\' is not defined. Did you mean: \'print\'?\n', 'returncode': 1}
**********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
2025-03-28 06:13:29,210 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
question_agent (to code_runner):

There was a NameError due to using 'printi' instead of 'print'. Let's correct that and execute the code again.
***** Suggested tool call (call_VvDlLL6tIvnQB1FHZ7FroUaL): python_execute_code *****
Arguments: 
{"code_execution_request":{"code":"import platform\nprint('Python version:', platform.python_version())\nfor i in range(1, 12):\n  print(i)","libraries":[]}}
************************************************************************************

--------------------------------------------------------------------------------

>>>>>>>> EXECUTING FUNCTION python_execute_code...
Call ID: call_VvDlLL6tIvnQB1FHZ7FroUaL
Input arguments: {'code_execution_request': {'code': "import platform\nprint('Python version:', platform.python_version())\nfor i in range(1, 12):\n  print(i)", 'libraries': []}}
code_runner (to question_agent):

***** Response from calling tool (call_VvDlLL6tIvnQB1FHZ7FroUaL) *****
{'success': True, 'stdout': 'Python version: 3.11.11\n1\n2\n3\n4\n5\n6\n7\n8\n9\n10\n11\n', 'stderr': '', 'returncode': 0}
**********************************************************************

--------------------------------------------------------------------------------

>>>>>>>> USING AUTO REPLY...
2025-03-28 06:13:30,605 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
question_agent (to code_runner):

The corrected code executed successfully and printed the current Python version followed by the numbers 1 to 11:

'''
Python version: 3.11.11
1
2
3
4
5
6
7
8
9
10
11
'''

--------------------------------------------------------------------------------
2025-03-28 06:13:31,356 - INFO - HTTP Request: POST https://api.openai.com/v1/chat/completions "HTTP/1.1 200 OK"
code_runner (to question_agent):

TERMINATE

--------------------------------------------------------------------------------
Please give feedback to code_runner. Press enter or type 'exit' to stop the conversation: exit

>>>>>>>> TERMINATING RUN (34bd4162-7dfd-47d1-98ad-3f4072b51ba0): Termination message condition on agent 'question_agent' met and no human input provided

>>>>>>>> TERMINATING RUN (178595eb-15b8-417f-a0e0-f3e91765d766): Termination message condition on agent 'code_runner' met
Result: 
2025-03-28 06:13:35,669 - INFO - Stopping Docker container: ag2_docker_env_8e729dc8
2025-03-28 06:13:45,883 - INFO - Removing Docker container: ag2_docker_env_8e729dc8

Copy link

codecov bot commented Mar 27, 2025

Codecov Report

Attention: Patch coverage is 26.07656% with 309 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
autogen/environments/docker_python_environment.py 12.57% 139 Missing ⚠️
autogen/environments/venv_python_environment.py 17.02% 78 Missing ⚠️
autogen/environments/working_directory.py 36.36% 28 Missing ⚠️
autogen/environments/system_python_environment.py 37.83% 23 Missing ⚠️
autogen/environments/python_environment.py 51.16% 21 Missing ⚠️
...perimental/code_execution/python_code_execution.py 39.39% 20 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (fe622b4) and HEAD (03c7db4). Click for more details.

HEAD has 2513 uploads less than BASE
Flag BASE (fe622b4) HEAD (03c7db4)
3.10 190 0
ubuntu-latest 288 1
commsagent-discord 18 0
optional-deps 281 0
3.13 167 0
core-without-llm 28 1
macos-latest 208 0
commsagent-slack 18 0
3.9 163 0
3.11 128 1
browser-use 14 0
3.12 71 0
windows-latest 223 0
commsagent-telegram 18 0
retrievechat-qdrant 28 0
graph-rag-falkor-db 12 0
retrievechat-mongodb 20 0
jupyter-executor 18 0
retrievechat-pgvector 20 0
twilio 18 0
retrievechat 30 0
interop 26 0
interop-pydantic-ai 18 0
interop-crewai 18 0
interop-langchain 18 0
crawl4ai 26 0
websockets 17 0
docs 12 0
cerebras 29 0
teachable 8 0
agent-eval 2 0
gpt-assistant-agent 6 0
together 27 0
anthropic 31 0
long-context 6 0
retrievechat-couchbase 6 0
lmm 8 0
websurfer 29 0
gemini 30 0
mistral 26 0
llama-index-agent 6 0
swarm 28 0
groq 28 0
bedrock 30 0
cohere 29 0
ollama 29 0
core-llm 18 0
integration 24 0
openai-realtime 2 0
falkordb 4 0
gemini-realtime 2 0
captainagent 2 0
autobuild 2 0
neo4j 4 0
deepseek 2 0
openai 2 0
Files with missing lines Coverage Δ
autogen/environments/__init__.py 100.00% <100.00%> (ø)
autogen/tools/experimental/__init__.py 100.00% <100.00%> (ø)
...ogen/tools/experimental/code_execution/__init__.py 100.00% <100.00%> (ø)
...perimental/code_execution/python_code_execution.py 39.39% <39.39%> (ø)
autogen/environments/python_environment.py 51.16% <51.16%> (ø)
autogen/environments/system_python_environment.py 37.83% <37.83%> (ø)
autogen/environments/working_directory.py 36.36% <36.36%> (ø)
autogen/environments/venv_python_environment.py 17.02% <17.02%> (ø)
autogen/environments/docker_python_environment.py 12.57% <12.57%> (ø)

... and 70 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@marklysze
Copy link
Collaborator Author

Note: I want to refactor the Docker configuration parameters into two new classes, DockerExistingContainerConfig and DockerNewContainerConfig, as the DockerPythonEnvironment parameter list is long.

@davorrunje davorrunje self-assigned this Apr 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

2 participants