[Security] PandasTools: RCE via model-controlled pandas function selection (read_pickle)

## Summary
`PandasTools` selects the pandas callable to run from a **model-controlled string** and then calls it with model-controlled keyword arguments:

```python
getattr(pd, create_using_function)(**function_parameters)
```

Because `create_using_function` and `function_parameters` come from the agent/LLM tool call, a prompt-injection–controlled model can choose `read_pickle`, which deserializes attacker-supplied data with `pickle` → **remote code execution** on the host running the agent. The same dispatch also allows arbitrary local file read (e.g. `read_csv`/`read_json` on any path), with the contents returned to the model.

## Affected code
`libs/agno/agno/tools/pandas.py` (v2.6.20, `main` @ `07fe6b2`)

- `create_pandas_dataframe()`:
  ```python
  # create_using_function / function_parameters are model-controlled tool args
  dataframe = getattr(pd, create_using_function)(**function_parameters)
  if dataframe is None: ...
  if not isinstance(dataframe, pd.DataFrame): ...   # type check runs AFTER the call
  ```
- `run_dataframe_operation()` has the same pattern (`getattr(dataframe, operation)(**operation_parameters)`) — a secondary sink that also exposes e.g. `query`/`eval` and `to_pickle`.

Both tools are registered by default (`enable_create_pandas_dataframe` / `enable_run_dataframe_operation` default to `True`).

## Root cause
`getattr(pd, <model string>)(**<model kwargs>)` is an unrestricted dispatch to any top-level `pandas` callable. `pandas.read_pickle()` runs `pickle.load`, which executes a payload's `__reduce__` during deserialization → code execution. The `isinstance(..., pd.DataFrame)` guard runs only *after* the call, so it cannot prevent execution.

## Preconditions / attacker model
- An agent is configured with `PandasTools` (a standard toolkit).
- Tool arguments are influenced by untrusted input — a user message or **indirect prompt injection** (e.g. a document or web page the agent ingests).
- Default configuration; no unsafe/non-default option required. `read_pickle` accepts local paths **and URLs**, so the payload can be remote.

## Impact
- **Critical:** remote code execution on the agent host via `read_pickle` of an attacker-controlled payload.
- Additional: arbitrary local file read (credential/secret files) returned to the model; outbound fetch to model-chosen URLs.

## Steps to reproduce (mechanism)
1. Instantiate an agent/toolkit with `PandasTools`.
2. Cause a tool call equivalent to:
   `create_pandas_dataframe(dataframe_name="x", create_using_function="read_pickle", function_parameters={"filepath_or_buffer": <attacker-controlled path or URL>})`.
3. `pandas.read_pickle` deserializes the target; a pickle carrying a `__reduce__` payload executes on load — before the DataFrame type check.

I have a **minimal, local-only PoC** (temporary files, fake data, no external hosts) that demonstrates both code execution and the arbitrary-file-read variant, plus a regression test. To avoid publishing a working exploit I have not inlined the payload here — I can share the full PoC privately. (Private Vulnerability Reporting appears to be disabled for this repo and I could not find a `SECURITY.md`; please enable PVR or provide a security contact.)

## Expected vs actual
- **Actual:** the model can invoke any pandas reader, including `read_pickle` (RCE) and readers that open arbitrary paths/URLs.
- **Expected:** only a fixed set of safe DataFrame constructors should be reachable; `read_pickle` (untrusted deserialization) must never be reachable from model-controlled input.

## Suggested fix
Restrict `create_using_function` to an allowlist of safe constructors and refuse everything else; harden `run_dataframe_operation` against private attributes and code-executing methods. Verified locally: after the change `read_pickle` is rejected (no deserialization, no code execution) while `read_csv` and normal operations still work.

```diff
--- a/libs/agno/agno/tools/pandas.py
+++ b/libs/agno/agno/tools/pandas.py
@@ -9,6 +9,26 @@
     raise ImportError("`pandas` not installed. Please install using `pip install pandas`.")
 
 
+# Safe pandas top-level constructors PandasTools may dispatch to from model-controlled
+# input. read_pickle / read_hdf are intentionally excluded: they deserialize untrusted
+# data (pickle) and lead to remote code execution.
+_ALLOWED_CREATE_FUNCS = {
+    "DataFrame",
+    "read_csv",
+    "read_json",
+    "read_excel",
+    "read_parquet",
+    "read_table",
+    "read_html",
+    "read_feather",
+    "read_orc",
+    "read_xml",
+}
+
+# DataFrame methods that execute code or (de)serialize to arbitrary locations.
+_BLOCKED_DF_OPERATIONS = {"query", "eval", "to_pickle", "to_hdf"}
+
+
 class PandasTools(Toolkit):
     def __init__(
         self,
@@ -50,8 +70,20 @@
             if dataframe_name in self.dataframes:
                 return f"Dataframe already exists: {dataframe_name}"
 
+            # Only allow a fixed set of safe constructors. This blocks reaching
+            # `pd.read_pickle` (untrusted deserialization -> RCE) and other
+            # non-constructor callables via model-controlled input.
+            if create_using_function not in _ALLOWED_CREATE_FUNCS:
+                return (
+                    f"Unsupported function '{create_using_function}'. "
+                    f"Allowed functions: {sorted(_ALLOWED_CREATE_FUNCS)}"
+                )
+            func = getattr(pd, create_using_function, None)
+            if not callable(func):
+                return f"Unsupported function '{create_using_function}'."
+
             # Create the dataframe
-            dataframe = getattr(pd, create_using_function)(**function_parameters)
+            dataframe = func(**function_parameters)
             if dataframe is None:
                 return f"Error creating dataframe: {dataframe_name}"
             if not isinstance(dataframe, pd.DataFrame):
@@ -85,9 +117,18 @@
 
             # Get the dataframe
             dataframe = self.dataframes.get(dataframe_name)
+            if dataframe is None:
+                return f"Dataframe not found: {dataframe_name}"
+
+            # Reject private/dunder attributes and code-executing / arbitrary-write methods.
+            if operation.startswith("_") or operation in _BLOCKED_DF_OPERATIONS:
+                return f"Unsupported operation: {operation}"
+            op = getattr(dataframe, operation, None)
+            if not callable(op):
+                return f"Unsupported operation: {operation}"
 
             # Run the operation
-            result = getattr(dataframe, operation)(**operation_parameters)
+            result = op(**operation_parameters)
 
             log_debug(f"Ran operation: {operation}")
             try:
```

## Environment tested
- agno `2.6.20`, latest `main` (commit `07fe6b2`), Python 3.12. Local-only reproduction; fake data only.

## Duplicate / publicness check
Searched issues/PRs (`read_pickle`, `PandasTools`, `pandas`) and published advisories — no existing report for this deserialization sink; the only `PandasTools` hit is an unrelated non-security bug. Distinct from the published `eval()`/`field_type` RCE advisory (different sink).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Security] PandasTools: RCE via model-controlled pandas function selection (read_pickle) #8699

Summary

Affected code

Root cause

Preconditions / attacker model

Impact

Steps to reproduce (mechanism)

Expected vs actual

Suggested fix

Environment tested

Duplicate / publicness check

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[Security] PandasTools: RCE via model-controlled pandas function selection (read_pickle) #8699

Description

Summary

Affected code

Root cause

Preconditions / attacker model

Impact

Steps to reproduce (mechanism)

Expected vs actual

Suggested fix

Environment tested

Duplicate / publicness check

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions