Skip to content

[Security] PandasTools: RCE via model-controlled pandas function selection (read_pickle) #8699

Description

@bogdancherniy11-sudo

Summary

PandasTools selects the pandas callable to run from a model-controlled string and then calls it with model-controlled keyword arguments:

getattr(pd, create_using_function)(**function_parameters)

Because create_using_function and function_parameters come from the agent/LLM tool call, a prompt-injection–controlled model can choose read_pickle, which deserializes attacker-supplied data with pickleremote code execution on the host running the agent. The same dispatch also allows arbitrary local file read (e.g. read_csv/read_json on any path), with the contents returned to the model.

Affected code

libs/agno/agno/tools/pandas.py (v2.6.20, main @ 07fe6b2)

  • create_pandas_dataframe():
    # create_using_function / function_parameters are model-controlled tool args
    dataframe = getattr(pd, create_using_function)(**function_parameters)
    if dataframe is None: ...
    if not isinstance(dataframe, pd.DataFrame): ...   # type check runs AFTER the call
  • run_dataframe_operation() has the same pattern (getattr(dataframe, operation)(**operation_parameters)) — a secondary sink that also exposes e.g. query/eval and to_pickle.

Both tools are registered by default (enable_create_pandas_dataframe / enable_run_dataframe_operation default to True).

Root cause

getattr(pd, <model string>)(**<model kwargs>) is an unrestricted dispatch to any top-level pandas callable. pandas.read_pickle() runs pickle.load, which executes a payload's __reduce__ during deserialization → code execution. The isinstance(..., pd.DataFrame) guard runs only after the call, so it cannot prevent execution.

Preconditions / attacker model

  • An agent is configured with PandasTools (a standard toolkit).
  • Tool arguments are influenced by untrusted input — a user message or indirect prompt injection (e.g. a document or web page the agent ingests).
  • Default configuration; no unsafe/non-default option required. read_pickle accepts local paths and URLs, so the payload can be remote.

Impact

  • Critical: remote code execution on the agent host via read_pickle of an attacker-controlled payload.
  • Additional: arbitrary local file read (credential/secret files) returned to the model; outbound fetch to model-chosen URLs.

Steps to reproduce (mechanism)

  1. Instantiate an agent/toolkit with PandasTools.
  2. Cause a tool call equivalent to:
    create_pandas_dataframe(dataframe_name="x", create_using_function="read_pickle", function_parameters={"filepath_or_buffer": <attacker-controlled path or URL>}).
  3. pandas.read_pickle deserializes the target; a pickle carrying a __reduce__ payload executes on load — before the DataFrame type check.

I have a minimal, local-only PoC (temporary files, fake data, no external hosts) that demonstrates both code execution and the arbitrary-file-read variant, plus a regression test. To avoid publishing a working exploit I have not inlined the payload here — I can share the full PoC privately. (Private Vulnerability Reporting appears to be disabled for this repo and I could not find a SECURITY.md; please enable PVR or provide a security contact.)

Expected vs actual

  • Actual: the model can invoke any pandas reader, including read_pickle (RCE) and readers that open arbitrary paths/URLs.
  • Expected: only a fixed set of safe DataFrame constructors should be reachable; read_pickle (untrusted deserialization) must never be reachable from model-controlled input.

Suggested fix

Restrict create_using_function to an allowlist of safe constructors and refuse everything else; harden run_dataframe_operation against private attributes and code-executing methods. Verified locally: after the change read_pickle is rejected (no deserialization, no code execution) while read_csv and normal operations still work.

--- a/libs/agno/agno/tools/pandas.py
+++ b/libs/agno/agno/tools/pandas.py
@@ -9,6 +9,26 @@
     raise ImportError("`pandas` not installed. Please install using `pip install pandas`.")
 
 
+# Safe pandas top-level constructors PandasTools may dispatch to from model-controlled
+# input. read_pickle / read_hdf are intentionally excluded: they deserialize untrusted
+# data (pickle) and lead to remote code execution.
+_ALLOWED_CREATE_FUNCS = {
+    "DataFrame",
+    "read_csv",
+    "read_json",
+    "read_excel",
+    "read_parquet",
+    "read_table",
+    "read_html",
+    "read_feather",
+    "read_orc",
+    "read_xml",
+}
+
+# DataFrame methods that execute code or (de)serialize to arbitrary locations.
+_BLOCKED_DF_OPERATIONS = {"query", "eval", "to_pickle", "to_hdf"}
+
+
 class PandasTools(Toolkit):
     def __init__(
         self,
@@ -50,8 +70,20 @@
             if dataframe_name in self.dataframes:
                 return f"Dataframe already exists: {dataframe_name}"
 
+            # Only allow a fixed set of safe constructors. This blocks reaching
+            # `pd.read_pickle` (untrusted deserialization -> RCE) and other
+            # non-constructor callables via model-controlled input.
+            if create_using_function not in _ALLOWED_CREATE_FUNCS:
+                return (
+                    f"Unsupported function '{create_using_function}'. "
+                    f"Allowed functions: {sorted(_ALLOWED_CREATE_FUNCS)}"
+                )
+            func = getattr(pd, create_using_function, None)
+            if not callable(func):
+                return f"Unsupported function '{create_using_function}'."
+
             # Create the dataframe
-            dataframe = getattr(pd, create_using_function)(**function_parameters)
+            dataframe = func(**function_parameters)
             if dataframe is None:
                 return f"Error creating dataframe: {dataframe_name}"
             if not isinstance(dataframe, pd.DataFrame):
@@ -85,9 +117,18 @@
 
             # Get the dataframe
             dataframe = self.dataframes.get(dataframe_name)
+            if dataframe is None:
+                return f"Dataframe not found: {dataframe_name}"
+
+            # Reject private/dunder attributes and code-executing / arbitrary-write methods.
+            if operation.startswith("_") or operation in _BLOCKED_DF_OPERATIONS:
+                return f"Unsupported operation: {operation}"
+            op = getattr(dataframe, operation, None)
+            if not callable(op):
+                return f"Unsupported operation: {operation}"
 
             # Run the operation
-            result = getattr(dataframe, operation)(**operation_parameters)
+            result = op(**operation_parameters)
 
             log_debug(f"Ran operation: {operation}")
             try:

Environment tested

  • agno 2.6.20, latest main (commit 07fe6b2), Python 3.12. Local-only reproduction; fake data only.

Duplicate / publicness check

Searched issues/PRs (read_pickle, PandasTools, pandas) and published advisories — no existing report for this deserialization sink; the only PandasTools hit is an unrelated non-security bug. Distinct from the published eval()/field_type RCE advisory (different sink).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions