Fix RCE vulnerability in Pixeltable MCP tools (eval/exec injection)#231
Fix RCE vulnerability in Pixeltable MCP tools (eval/exec injection)#231akuligowski9 wants to merge 4 commits intopatchy631:mainfrom
Conversation
Replace unvalidated eval() and exec() calls with a safe AST-based expression evaluator that only allows table column references, basic arithmetic/comparison operators, and literals. Restructure create_query to use Pixeltable's API directly instead of exec() with arbitrary code. Closes patchy631#229 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
📝 WalkthroughWalkthroughReplaces unsafe eval/exec with an AST-based safe evaluator for user expressions, adds identifier/column validation and descriptive errors, and updates Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (3)
pixeltable-mcp/base-sdk/tools.py (3)
59-66: Dead code:True/False/Noneinast.Namehandler.Since Python 3.8+,
True,False, andNoneare parsed asast.Constantnodes rather thanast.Name. This branch will never be reached by the parser. Not harmful, but could be removed for clarity if you only target Python ≥ 3.8.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 59 - 66, Remove the unreachable True/False/None handling inside the isinstance(node, ast.Name) branch: keep only the table lookup (node.id == "table") and the ValueError for other names, and ensure boolean/None literal handling is implemented in the ast.Constant branch instead; update the block that checks isinstance(node, ast.Name) (the one returning namespace["table"]) and verify ast.Constant handling covers True/False/None so there is no dead code left in that Name handler.
354-359: Use explicit| Nonefor optional parameters (PEP 484).Ruff flags
list[str] = Noneas implicitOptional. Modern style prefers explicit union syntax.Proposed fix
def create_query( query_name: str, table_name: str, - select_columns: list[str] = None, - where_expr: str = None, + select_columns: list[str] | None = None, + where_expr: str | None = None, ) -> str:The same applies to
execute_queryparameters on lines 279-283 (not changed in this PR but worth noting for consistency).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 354 - 359, The function signatures use implicit Optional by setting defaults to None (e.g., select_columns: list[str] = None); update these annotations to explicit union types per PEP 484/604 (e.g., select_columns: list[str] | None and where_expr: str | None) and apply the same change to the execute_query function parameters so all optional params use the explicit "X | None" form.
329-335: Column name validation is inconsistent withcreate_query.
create_queryvalidates column names against_IDENTIFIER_RE(line 393) before callinghasattr/getattr, butexecute_querydoes not perform this check onselect_columnsororder_by_column. For defense-in-depth consistency, apply the same identifier validation here as well.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 329 - 335, In execute_query, validate each identifier in select_columns and order_by_column against the module _IDENTIFIER_RE (the same check used in create_query) before calling hasattr/getattr on data_source; if a name fails the regex, return the same error format used elsewhere (e.g., "Error: Column 'X' not found in 'Y'") rather than attempting getattr, and only then use hasattr/getattr to build select_args and set order_by; reference the select_columns, order_by_column, data_source, table_or_view_name variables and the _IDENTIFIER_RE pattern to ensure consistent defense-in-depth validation with create_query.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 88-96: The Compare handling in _eval_node incorrectly threads the
previous comparison result as the next left operand; change it so you evaluate
the left operand and each comparator into values (evaluate node.left and each
node in node.comparators), then for each (op, rhs_value) call op_func =
_SAFE_OPERATORS[type(op)] and compute pair_result = op_func(current_left_value,
rhs_value); accumulate the overall result by ANDing pair_result with a running
accumulator (e.g., total_result = total_result & pair_result) and then set
current_left_value = rhs_value for the next iteration; ensure you still raise on
unsupported ops using _SAFE_OPERATORS and return the accumulated total_result
from the Compare branch.
- Around line 31-32: The current _IDENTIFIER_RE allows names beginning with "_"
(enabling dunder access); update the regex _IDENTIFIER_RE to only allow
identifiers that start with an ASCII letter (no leading underscore) so names
like "__class__" are rejected, and add a defensive runtime check in _eval_node
to explicitly reject any attribute/name token that starts with "_" before
resolving it (e.g., check the string value and raise/return an error) so both
static and runtime paths prevent access to private/dunder members; reference
_IDENTIFIER_RE and _eval_node when making these changes.
- Around line 383-401: create_query currently builds a Pixeltable query object
but never persists it (query_name is only used in the success string); modify
create_query to persist the built query by calling pxt.create_view(query_name,
query) (mirroring create_view) after the query is constructed, handle and log
any errors from pxt.create_view, and only return the success message if
persistence succeeds; reference function/create symbols: create_query, query,
query_name, pxt.create_view, and create_view.
---
Nitpick comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 59-66: Remove the unreachable True/False/None handling inside the
isinstance(node, ast.Name) branch: keep only the table lookup (node.id ==
"table") and the ValueError for other names, and ensure boolean/None literal
handling is implemented in the ast.Constant branch instead; update the block
that checks isinstance(node, ast.Name) (the one returning namespace["table"])
and verify ast.Constant handling covers True/False/None so there is no dead code
left in that Name handler.
- Around line 354-359: The function signatures use implicit Optional by setting
defaults to None (e.g., select_columns: list[str] = None); update these
annotations to explicit union types per PEP 484/604 (e.g., select_columns:
list[str] | None and where_expr: str | None) and apply the same change to the
execute_query function parameters so all optional params use the explicit "X |
None" form.
- Around line 329-335: In execute_query, validate each identifier in
select_columns and order_by_column against the module _IDENTIFIER_RE (the same
check used in create_query) before calling hasattr/getattr on data_source; if a
name fails the regex, return the same error format used elsewhere (e.g., "Error:
Column 'X' not found in 'Y'") rather than attempting getattr, and only then use
hasattr/getattr to build select_args and set order_by; reference the
select_columns, order_by_column, data_source, table_or_view_name variables and
the _IDENTIFIER_RE pattern to ensure consistent defense-in-depth validation with
create_query.
pixeltable-mcp/base-sdk/tools.py
Outdated
| # Validate identifiers to prevent injection through crafted attribute names | ||
| _IDENTIFIER_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$") |
There was a problem hiding this comment.
_IDENTIFIER_RE allows dunder attributes, enabling potential sandbox escape or info leakage.
The regex ^[A-Za-z_][A-Za-z0-9_]*$ permits names like __class__, __init__, __globals__, etc. An attacker could chain attribute accesses such as table.__class__.__name__ to leak internal information, or potentially reach sensitive objects via table.__init__.__globals__. Even though function calls and subscripts are blocked, the returned values are sent back to the MCP client, which is an information-disclosure vector.
Reject attributes starting with _ to restrict access to public members only:
Proposed fix
-_IDENTIFIER_RE = re.compile(r"^[A-Za-z_][A-Za-z0-9_]*$")
+_IDENTIFIER_RE = re.compile(r"^[A-Za-z][A-Za-z0-9_]*$")And add a leading-underscore check in _eval_node for defense-in-depth:
if isinstance(node, ast.Attribute):
value = _eval_node(node.value, namespace)
attr = node.attr
if not _IDENTIFIER_RE.match(attr):
raise ValueError(f"Invalid attribute name: '{attr}'")
+ if attr.startswith("_"):
+ raise ValueError(f"Access to private attribute '{attr}' is not allowed.")
if not hasattr(value, attr):
raise ValueError(f"Attribute '{attr}' not found.")
return getattr(value, attr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pixeltable-mcp/base-sdk/tools.py` around lines 31 - 32, The current
_IDENTIFIER_RE allows names beginning with "_" (enabling dunder access); update
the regex _IDENTIFIER_RE to only allow identifiers that start with an ASCII
letter (no leading underscore) so names like "__class__" are rejected, and add a
defensive runtime check in _eval_node to explicitly reject any attribute/name
token that starts with "_" before resolving it (e.g., check the string value and
raise/return an error) so both static and runtime paths prevent access to
private/dunder members; reference _IDENTIFIER_RE and _eval_node when making
these changes.
pixeltable-mcp/base-sdk/tools.py
Outdated
| if isinstance(node, ast.Compare): | ||
| left = _eval_node(node.left, namespace) | ||
| for op, comparator in zip(node.ops, node.comparators): | ||
| op_func = _SAFE_OPERATORS.get(type(op)) | ||
| if op_func is None: | ||
| raise ValueError(f"Unsupported comparison: {type(op).__name__}") | ||
| right = _eval_node(comparator, namespace) | ||
| left = op_func(left, right) | ||
| return left |
There was a problem hiding this comment.
Chained comparisons (a < b < c) are evaluated incorrectly.
For a chained comparison like 1 < x < 10, Python's AST represents it as left=1, ops=[Lt, Lt], comparators=[x, 10]. The current loop reassigns left to the result of each pairwise comparison, so the second comparison becomes (1 < x) < 10 instead of the correct (1 < x) AND (x < 10).
With Pixeltable column expressions, the result of op_func(left, right) is a Pixeltable expression object, so feeding it as the left operand of the next comparison produces a semantically incorrect filter.
Proposed fix — AND each pairwise result together
if isinstance(node, ast.Compare):
left = _eval_node(node.left, namespace)
+ result = None
for op, comparator in zip(node.ops, node.comparators):
op_func = _SAFE_OPERATORS.get(type(op))
if op_func is None:
raise ValueError(f"Unsupported comparison: {type(op).__name__}")
right = _eval_node(comparator, namespace)
- left = op_func(left, right)
- return left
+ cmp = op_func(left, right)
+ result = cmp if result is None else (result & cmp)
+ left = right
+ return result🧰 Tools
🪛 Ruff (0.15.1)
[warning] 90-90: zip() without an explicit strict= parameter
Add explicit value for parameter strict=
(B905)
[warning] 93-93: Avoid specifying long messages outside the exception class
(TRY003)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pixeltable-mcp/base-sdk/tools.py` around lines 88 - 96, The Compare handling
in _eval_node incorrectly threads the previous comparison result as the next
left operand; change it so you evaluate the left operand and each comparator
into values (evaluate node.left and each node in node.comparators), then for
each (op, rhs_value) call op_func = _SAFE_OPERATORS[type(op)] and compute
pair_result = op_func(current_left_value, rhs_value); accumulate the overall
result by ANDing pair_result with a running accumulator (e.g., total_result =
total_result & pair_result) and then set current_left_value = rhs_value for the
next iteration; ensure you still raise on unsupported ops using _SAFE_OPERATORS
and return the accumulated total_result from the Compare branch.
| # Build the query using Pixeltable's API directly instead of exec() | ||
| query = table | ||
|
|
||
| if where_expr: | ||
| filter_condition = _safe_eval(where_expr, table) | ||
| query = query.where(filter_condition) | ||
|
|
||
| # Execute the function definition | ||
| exec(func_def) | ||
| if select_columns: | ||
| select_args = [] | ||
| for col_name in select_columns: | ||
| if not _IDENTIFIER_RE.match(col_name): | ||
| return f"Error: Invalid column name: '{col_name}'" | ||
| if hasattr(table, col_name): | ||
| select_args.append(getattr(table, col_name)) | ||
| else: | ||
| return f"Error: Column '{col_name}' not found in table '{table_name}'." | ||
| query = query.select(*select_args) | ||
|
|
||
| return f"Query '{query_name}' created successfully for table '{table_name}'." |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# First, let's locate the file and examine the full create_query function
git ls-files | grep -E 'tools\.py$' | head -5Repository: patchy631/ai-engineering-hub
Length of output: 225
🏁 Script executed:
#!/bin/bash
# Read the tools.py file to see the full create_query function and context
wc -l pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 108
🏁 Script executed:
#!/bin/bash
# Search for the create_query function definition and the surrounding context
rg -n 'def create_query' -A 60 pixeltable-mcp/base-sdk/tools.py | head -100Repository: patchy631/ai-engineering-hub
Length of output: 2217
🏁 Script executed:
#!/bin/bash
# Search for any module-level storage, registries, or persistence patterns
rg -n 'QUERY|query.*dict|register.*query|save.*query|named_queries|_queries' pixeltable-mcp/base-sdk/tools.py | head -20Repository: patchy631/ai-engineering-hub
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Check if there are any class definitions or global state management
rg -n 'class |^[A-Z_]+\s*=\s*{|}' pixeltable-mcp/base-sdk/tools.py | head -20Repository: patchy631/ai-engineering-hub
Length of output: 202
🏁 Script executed:
#!/bin/bash
# Look for Pixeltable query persistence API calls - check for create_named_query, save, etc.
rg -n 'pxt\.|table\.|query\.' pixeltable-mcp/base-sdk/tools.py | grep -i 'save\|create\|persist\|register'Repository: patchy631/ai-engineering-hub
Length of output: 594
🏁 Script executed:
#!/bin/bash
# Check git history for the old implementation
git log --oneline -p pixeltable-mcp/base-sdk/tools.py 2>/dev/null | grep -A 30 'def create_query' | head -50Repository: patchy631/ai-engineering-hub
Length of output: 1146
🏁 Script executed:
#!/bin/bash
# Look at create_view to understand the pattern - does it persist?
rg -n 'def create_view' -A 20 pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 962
🏁 Script executed:
#!/bin/bash
# Search for execute_query function to see what it expects
rg -n 'def execute_query' -A 30 pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 1446
🏁 Script executed:
#!/bin/bash
# Look for any global query storage or registry
rg -n 'queries\s*=' pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Search for where query_name is referenced elsewhere in the file
rg -n 'query_name' pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 245
🏁 Script executed:
#!/bin/bash
# Check if Pixeltable has a create_named_query or similar API
rg -n 'pxt\.' pixeltable-mcp/base-sdk/tools.py | grep -i queryRepository: patchy631/ai-engineering-hub
Length of output: 54
🏁 Script executed:
#!/bin/bash
# Look at the full file structure and imports
head -50 pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 1507
🏁 Script executed:
#!/bin/bash
# Check the full create_view implementation to see how it persists
rg -n 'def create_view' -A 30 pixeltable-mcp/base-sdk/tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 1418
🏁 Script executed:
#!/bin/bash
# Search for any reference to "query" in the entire tools.py that might reveal storage patterns
rg -n 'query' pixeltable-mcp/base-sdk/tools.py | head -30Repository: patchy631/ai-engineering-hub
Length of output: 1439
🏁 Script executed:
#!/bin/bash
# Check if there's any other file that might store queries
git ls-files | grep -E 'tools|query|config' | xargs grep -l 'create_query\|named.*query' 2>/dev/null | head -5Repository: patchy631/ai-engineering-hub
Length of output: 144
🌐 Web query:
Pixeltable create_named_query API documentation
💡 Result:
Pixeltable does not have a create_named_query API in its Python SDK docs; that method name is commonly associated with AWS Athena / boto3 (unrelated to Pixeltable).
In Pixeltable, the closest equivalents to a “named/saved query” are:
- Create a named view from a query with
pxt.create_view(path, base=Query|Table, ...)(this persists the query as a view you canpxt.get_table()later). [1] - Define a reusable query function using the
@pxt.querydecorator (commonly used for retrieval helpers in RAG-style pipelines). [2] - Build queries using the
pixeltable.QueryAPI (eg,where(),select(),order_by(),collect()). [3]
Sources:
[1] Pixeltable SDK create_view() reference https://docs.pixeltable.com/sdk/latest/pixeltable
[2] Example of @pxt.query usage in Pixeltable docs (VoyageAI integration page) https://docs.pixeltable.com/sdk/v0.5.3/voyageai
[3] Pixeltable Query class reference https://docs.pixeltable.com/sdk/v0.5.1/query
The built query is never persisted — query_name is unused beyond the success message.
The function constructs a query object but never persists it via Pixeltable's API. Unlike the parallel create_view() function (line 264), which explicitly calls pxt.create_view(), create_query() builds a query without storing it. The query_name parameter is used only in the return string (line 401) and has no other effect. This means the function claims to "create" a named query but does not actually register it anywhere, making it impossible to retrieve later.
To fix this, either:
- Call
pxt.create_view(query_name, query)if the intent is to persist as a view, or - Store the query in a module-level registry if raw queries need to be cached, or
- Refactor the function to clarify that it validates and constructs a query without persisting it.
🧰 Tools
🪛 Ruff (0.15.1)
[warning] 401-401: Consider moving this statement to an else block
(TRY300)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pixeltable-mcp/base-sdk/tools.py` around lines 383 - 401, create_query
currently builds a Pixeltable query object but never persists it (query_name is
only used in the success string); modify create_query to persist the built query
by calling pxt.create_view(query_name, query) (mirroring create_view) after the
query is constructed, handle and log any errors from pxt.create_view, and only
return the success message if persistence succeeds; reference function/create
symbols: create_query, query, query_name, pxt.create_view, and create_view.
- Block dunder/private attribute access (e.g. __class__, __globals__) by rejecting identifiers starting with underscore in both the regex and an explicit runtime check - Fix chained comparison evaluation (a < b < c) to correctly AND each pairwise result instead of threading the comparison result as the next left operand - Persist created queries as Pixeltable views via pxt.create_view() so query_name is actually stored and retrievable - Remove dead code for True/False/None in ast.Name handler (these are ast.Constant on Python 3.8+) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
pixeltable-mcp/base-sdk/tools.py (1)
319-338:⚠️ Potential issue | 🟠 Major
order_by_columnandselect_columnsbypass the safe evaluator — missing identifier validation.In
execute_query,order_by_column(Line 321-322) andselect_columns(Lines 335-336) pass user-supplied strings directly tohasattr/getattrwithout_IDENTIFIER_REvalidation or underscore checks. An attacker could passorder_by_column="__class__"orselect_columns=["__dict__"]to leak internal object information, bypassing the protections this PR adds for expression evaluation.Compare with
create_query(Line 396) which correctly validates column names with_IDENTIFIER_RE.🛡️ Proposed fix — validate identifiers before getattr
# Apply order by if provided if order_by_column: + if not _IDENTIFIER_RE.match(order_by_column): + return f"Error: Invalid column name: '{order_by_column}'" # Handle ordering on a specific column if hasattr(data_source, order_by_column):if select_columns: select_args = [] for col_name in select_columns: + if not _IDENTIFIER_RE.match(col_name): + return f"Error: Invalid column name: '{col_name}'" if hasattr(data_source, col_name):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 319 - 338, execute_query currently calls hasattr/getattr with user-supplied order_by_column and select_columns which can be "__dunder__" names; validate each identifier against the existing _IDENTIFIER_RE and ensure it does not start with '_' before using hasattr/getattr (same approach as create_query). For order_by_column: check _IDENTIFIER_RE.match(order_by_column) and order_by_column[0] != '_' and return an error if invalid before calling getattr(data_source, order_by_column). For each col_name in select_columns: apply the same validation and only then use getattr(data_source, col_name), returning an error for invalid identifiers.
🧹 Nitpick comments (3)
pixeltable-mcp/base-sdk/tools.py (3)
35-48: Consider adding an expression length or AST depth limit to mitigate DoS.A deeply nested expression (e.g.,
(((((...))))thousands of levels deep) could exhaust Python's recursion limit and crash the process. A simple guard like rejecting expressions over a reasonable character length (e.g., 1000–2000 chars) before parsing would be a low-cost defense.🛡️ Proposed guard
def _safe_eval(expression: str, table): + if len(expression) > 2000: + raise ValueError("Expression too long (max 2000 characters).") try: tree = ast.parse(expression, mode="eval")🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 35 - 48, Add a cheap DoS guard to _safe_eval by rejecting overly large or deeply nested expressions before parsing: check expression length (e.g., limit to 1000–2000 chars) and return/raise ValueError if exceeded, and after ast.parse use a quick AST depth check (walk tree.body with a max depth counter) to reject expressions that exceed a safe depth; update error messages in _safe_eval to indicate which limit was hit and reference the helper _eval_node for further safety assumptions.
357-362: Use explicitOptional/ union syntax for parameters defaulting toNone.PEP 484 prohibits implicit
Optional—list[str] = Nonedoesn't declare thatNoneis an accepted type.Proposed fix
def create_query( query_name: str, table_name: str, - select_columns: list[str] = None, - where_expr: str = None, + select_columns: list[str] | None = None, + where_expr: str | None = None, ) -> str:The same applies to
create_view(Line 245:filter_expr: str = None) andexecute_query(Lines 282-286).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 357 - 362, The function signatures use defaults of None without explicit Optional/union types; update create_query, create_view, and execute_query signatures so parameters that default to None are typed explicitly (e.g., select_columns: Optional[list[str]] or select_columns: list[str] | None and filter_expr: Optional[str] / str | None), and add the required import (from typing import Optional) if using Optional; ensure all occurrences (create_query, create_view, execute_query) are updated consistently.
54-56:ast.Constantpasses throughbytes,frozenset,tuple, andEllipsisliterals unchecked.
ast.Constantcan represent more than just strings, numbers, booleans, andNone. For example,b"bytes"produces anast.Constantwith abytesvalue, and(1, 2, 3)produces atupleconstant. While these are unlikely to cause a security issue, they could lead to confusing downstream errors. Consider adding a type check onnode.value.🛡️ Proposed type guard
if isinstance(node, ast.Constant): + if not isinstance(node.value, (int, float, str, bool, type(None))): + raise ValueError( + f"Unsupported literal type: {type(node.value).__name__}" + ) return node.value🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 54 - 56, The ast.Constant handler in tools.py currently returns node.value unchecked; change it to validate node.value's type (e.g., allow only str, int, float, bool, and NoneType) before returning, and reject other types (bytes, tuple, frozenset, Ellipsis, etc.) by raising a clear ValueError or returning an explicit error value; update the ast.Constant branch to use isinstance(node.value, (str, int, float, bool, type(None))) and include a descriptive error mentioning the unsupported constant type so downstream code gets a clear failure.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Outside diff comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 319-338: execute_query currently calls hasattr/getattr with
user-supplied order_by_column and select_columns which can be "__dunder__"
names; validate each identifier against the existing _IDENTIFIER_RE and ensure
it does not start with '_' before using hasattr/getattr (same approach as
create_query). For order_by_column: check _IDENTIFIER_RE.match(order_by_column)
and order_by_column[0] != '_' and return an error if invalid before calling
getattr(data_source, order_by_column). For each col_name in select_columns:
apply the same validation and only then use getattr(data_source, col_name),
returning an error for invalid identifiers.
---
Duplicate comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 386-405: The code correctly persists the built Pixeltable query as
a view using pxt.create_view(query_name, query), so no code changes are
required; ensure that query_name, query, table, and select_columns remain
unchanged and that pxt.create_view is invoked after building the final query so
the view can be retrieved later.
- Around line 87-99: Chained comparison handling in the ast.Compare branch is
correct; keep the implementation in the block that uses _eval_node,
_SAFE_OPERATORS, and the left/right/result accumulation exactly as written
(pairwise op_func(left, right) with result = cmp if result is None else (result
& cmp) and left = right) — no change needed.
- Around line 66-76: The attribute-access guard for AST Attribute nodes is fine
as written: _IDENTIFIER_RE, the explicit attr.startswith("_") check, and the
hasattr/getattr flow in the ast.Attribute handling correctly block
dunder/private access and validate names; no code change required—leave the
block around the ast.Attribute case (the variables node.attr, _IDENTIFIER_RE,
and the startswith("_") check) as is.
---
Nitpick comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 35-48: Add a cheap DoS guard to _safe_eval by rejecting overly
large or deeply nested expressions before parsing: check expression length
(e.g., limit to 1000–2000 chars) and return/raise ValueError if exceeded, and
after ast.parse use a quick AST depth check (walk tree.body with a max depth
counter) to reject expressions that exceed a safe depth; update error messages
in _safe_eval to indicate which limit was hit and reference the helper
_eval_node for further safety assumptions.
- Around line 357-362: The function signatures use defaults of None without
explicit Optional/union types; update create_query, create_view, and
execute_query signatures so parameters that default to None are typed explicitly
(e.g., select_columns: Optional[list[str]] or select_columns: list[str] | None
and filter_expr: Optional[str] / str | None), and add the required import (from
typing import Optional) if using Optional; ensure all occurrences (create_query,
create_view, execute_query) are updated consistently.
- Around line 54-56: The ast.Constant handler in tools.py currently returns
node.value unchecked; change it to validate node.value's type (e.g., allow only
str, int, float, bool, and NoneType) before returning, and reject other types
(bytes, tuple, frozenset, Ellipsis, etc.) by raising a clear ValueError or
returning an explicit error value; update the ast.Constant branch to use
isinstance(node.value, (str, int, float, bool, type(None))) and include a
descriptive error mentioning the unsupported constant type so downstream code
gets a clear failure.
- Validate order_by_column and select_columns in execute_query against _IDENTIFIER_RE before passing to hasattr/getattr (prevents dunder access like __class__ or __dict__) - Add expression length limit (2000 chars) to _safe_eval to mitigate DoS via deeply nested expressions - Restrict ast.Constant to only allow int, float, str, bool, and None (reject bytes, tuple, frozenset, Ellipsis) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
pixeltable-mcp/base-sdk/tools.py (2)
370-371: ImplicitOptionalannotations (Ruff RUF013).
list[str] = Noneandstr = Noneuse implicitOptional, which PEP 484 prohibits.♻️ Proposed fix
- select_columns: list[str] = None, - where_expr: str = None, + select_columns: list[str] | None = None, + where_expr: str | None = None,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 370 - 371, Parameters select_columns and where_expr currently use implicit Optional by assigning None to bare types (list[str] = None, str = None); update their type annotations to explicit Optional types (e.g., Optional[list[str]] and Optional[str]) and import Optional from typing (or from typing import Optional at top) so the function signature uses select_columns: Optional[list[str]] = None and where_expr: Optional[str] = None; keep the default None values unchanged and update any related type checks if needed.
93-105: Chained comparison fix is correct; considerstrict=Trueonzip.The AND-accumulation pattern correctly handles
a < b < cas(a < b) & (b < c). The past issue is resolved.
zip(node.ops, node.comparators)withoutstrict=Trueis safe because the Python AST guarantees equal-length lists, but addingstrict=Truemakes the assumption explicit and would surface any future invariant violation as an earlyValueError.♻️ Suggested improvement
- for op, comparator in zip(node.ops, node.comparators): + for op, comparator in zip(node.ops, node.comparators, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 93 - 105, In the ast.Compare handling block that uses zip(node.ops, node.comparators) (inside the comparison evaluation using _eval_node and _SAFE_OPERATORS), change the zip call to zip(node.ops, node.comparators, strict=True) so any mismatch between ops and comparators surfaces immediately as an error; leave the rest of the loop logic (op_func lookup, calling _eval_node, combining results) unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Line 28: The current AST operator mapping maps ast.Not to operator.not_, which
causes UnaryOp handling in the function that evaluates AST (see the ast.UnaryOp
handler) to produce plain Python bools instead of Pixeltable expressions; change
the mapping so ast.Not maps to operator.invert (so the UnaryOp path uses
__invert__/~) to ensure negation yields a Pixeltable expression rather than
operator.not_. Include a note to update the mapping dictionary entry for ast.Not
and verify the ast.UnaryOp handler still looks up operators from that mapping
(no other handler changes required).
---
Nitpick comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 370-371: Parameters select_columns and where_expr currently use
implicit Optional by assigning None to bare types (list[str] = None, str =
None); update their type annotations to explicit Optional types (e.g.,
Optional[list[str]] and Optional[str]) and import Optional from typing (or from
typing import Optional at top) so the function signature uses select_columns:
Optional[list[str]] = None and where_expr: Optional[str] = None; keep the
default None values unchanged and update any related type checks if needed.
- Around line 93-105: In the ast.Compare handling block that uses zip(node.ops,
node.comparators) (inside the comparison evaluation using _eval_node and
_SAFE_OPERATORS), change the zip call to zip(node.ops, node.comparators,
strict=True) so any mismatch between ops and comparators surfaces immediately as
an error; leave the rest of the loop logic (op_func lookup, calling _eval_node,
combining results) unchanged.
pixeltable-mcp/base-sdk/tools.py
Outdated
| ast.Gt: operator.gt, | ||
| ast.GtE: operator.ge, | ||
| ast.USub: operator.neg, | ||
| ast.Not: operator.not_, |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
cd pixeltable-mcp/base-sdk && head -40 tools.py | tail -15Repository: patchy631/ai-engineering-hub
Length of output: 560
🏁 Script executed:
cd pixeltable-mcp/base-sdk && sed -n '108,112p' tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 340
🏁 Script executed:
cd pixeltable-mcp/base-sdk && sed -n '115,125p' tools.pyRepository: patchy631/ai-engineering-hub
Length of output: 563
🌐 Web query:
Pixeltable NOT operator column filter ~ operator.invert
💡 Result:
In Pixeltable, boolean NOT is done with the bitwise-invert operator ~ (because Python’s not can’t be overloaded). (docs.pixeltable.com)
Example:
# NOT (t.label == '5')
t.where(~(t.label == '5')).select(t.image).show()Equivalent using operator.invert:
import operator
t.where(operator.invert(t.label == '5')).select(t.image)Use parentheses around comparisons before applying ~ (as shown). (docs.pixeltable.com)
Citations:
- 1: https://docs.pixeltable.com/tutorials/queries-and-expressions?utm_source=openai
- 2: https://docs.pixeltable.com/tutorials/queries-and-expressions?utm_source=openai
ast.Not maps to operator.not_, which breaks Pixeltable filter expressions.
Pixeltable's Boolean operators mirror Pandas: & for AND, | for OR, and ~ for NOT. The ast.BoolOp handler already correctly uses &/| (bitwise). However, operator.not_ is Python's not operator, which calls __bool__ and returns a plain Python bool — not a Pixeltable expression. Per Pixeltable docs, this is intentional: "Python's not can't be overloaded, only ~ can be." This means any expression using not table.column will either raise TypeError (if Pixeltable forbids implicit bool conversion) or return a Python True/False instead of the correct negation expression, breaking the downstream .where() call.
The downstream effect is in the ast.UnaryOp handler (lines 108–112), which evaluates not via this mapping.
Fix: map ast.Not to operator.invert (which calls __invert__, the ~ operator):
Proposed fix
- ast.Not: operator.not_,
+ ast.Not: operator.invert,🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@pixeltable-mcp/base-sdk/tools.py` at line 28, The current AST operator
mapping maps ast.Not to operator.not_, which causes UnaryOp handling in the
function that evaluates AST (see the ast.UnaryOp handler) to produce plain
Python bools instead of Pixeltable expressions; change the mapping so ast.Not
maps to operator.invert (so the UnaryOp path uses __invert__/~) to ensure
negation yields a Pixeltable expression rather than operator.not_. Include a
note to update the mapping dictionary entry for ast.Not and verify the
ast.UnaryOp handler still looks up operators from that mapping (no other handler
changes required).
operator.not_ calls __bool__ and returns a plain Python bool, which breaks Pixeltable filter expressions. Pixeltable uses ~ (__invert__) for boolean negation, matching its & and | convention for AND/OR. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
pixeltable-mcp/base-sdk/tools.py (1)
366-421: Past issue resolved: query is now persisted viapxt.create_view.The restructured
create_querycorrectly builds the query from structured parameters and persists it. The docstring accurately describes the behavior.One minor type annotation nit:
Optional type hints (PEP 484)
def create_query( query_name: str, table_name: str, - select_columns: list[str] = None, - where_expr: str = None, + select_columns: list[str] | None = None, + where_expr: str | None = None, ) -> str:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 366 - 421, Update the optional parameter type annotations in create_query to use typing.Optional for PEP 484 compliance: change select_columns: list[str] = None to select_columns: Optional[list[str]] = None and where_expr: str = None to where_expr: Optional[str] = None, and add "from typing import Optional" to the imports; keep the function name create_query and its behavior unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Line 28: The mapping for AST unary not is wrong: replace the entry that maps
ast.Not -> operator.not_ so Pixeltable expressions use bitwise invert instead;
update the mapping in tools.py where the operator table/dict contains "ast.Not:
operator.not_" to use operator.invert (the ~ operator) so expressions like "not
table.is_active" behave as Pixeltable/pandas-style negation rather than calling
__bool__ and producing a plain bool.
---
Nitpick comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 366-421: Update the optional parameter type annotations in
create_query to use typing.Optional for PEP 484 compliance: change
select_columns: list[str] = None to select_columns: Optional[list[str]] = None
and where_expr: str = None to where_expr: Optional[str] = None, and add "from
typing import Optional" to the imports; keep the function name create_query and
its behavior unchanged.
There was a problem hiding this comment.
🧹 Nitpick comments (3)
pixeltable-mcp/base-sdk/tools.py (3)
366-372: PEP 484 implicitOptionalincreate_querysignature.Both
select_columns: list[str] = Noneandwhere_expr: str = Noneuse an implicitOptional, which is disallowed by PEP 484 (Ruff RUF013).♻️ Proposed fix
def create_query( query_name: str, table_name: str, - select_columns: list[str] = None, - where_expr: str = None, + select_columns: list[str] | None = None, + where_expr: str | None = None, ) -> str:🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 366 - 372, The function create_query currently annotates default None values with implicit Optional types; update the signature to use explicit Optional annotations by importing Optional from typing and changing the param annotations to select_columns: Optional[list[str]] = None and where_expr: Optional[str] = None (leave defaults as None) so the types comply with PEP 484/Ruff RUF013; ensure the import of Optional is added to the module if not already present and adjust any type-use occurrences accordingly.
96-97:zip()withoutstrict=True.Ruff B905 flags this. The
len(ops) == len(comparators)invariant is guaranteed by the AST, so there's no runtime risk, but addingstrict=Truemakes the contract explicit.♻️ Proposed fix
- for op, comparator in zip(node.ops, node.comparators): + for op, comparator in zip(node.ops, node.comparators, strict=True):🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 96 - 97, The loop using zip(node.ops, node.comparators) in tools.py should be made explicit about the expected length invariant to satisfy Ruff B905; update the for-loop inside the comparison handling (where node.ops and node.comparators are iterated) to use zip(..., strict=True) so mismatched lengths would raise immediately and the contract is explicit (i.e., replace zip(node.ops, node.comparators) with zip(node.ops, node.comparators, strict=True)).
73-82:startswith("_")check at line 78 is dead code.
_IDENTIFIER_REalready requires a leading[A-Za-z], so any attribute starting with_fails there.matchat line 76 and raises before reaching line 78. Thestartswith("_")guard is unreachable.♻️ Proposed fix
if isinstance(node, ast.Attribute): value = _eval_node(node.value, namespace) attr = node.attr if not _IDENTIFIER_RE.match(attr): raise ValueError(f"Invalid attribute name: '{attr}'") - if attr.startswith("_"): - raise ValueError(f"Access to private attribute '{attr}' is not allowed.") if not hasattr(value, attr): raise ValueError(f"Attribute '{attr}' not found.") return getattr(value, attr)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pixeltable-mcp/base-sdk/tools.py` around lines 73 - 82, The startswith("_") guard in the ast.Attribute handling of _eval_node is unreachable because _IDENTIFIER_RE already rejects leading underscores; remove the redundant "if attr.startswith('_')" branch (and its ValueError) from the Attribute branch in _eval_node, and if you intended to explicitly forbid private attributes instead of relying on the regex, either relax _IDENTIFIER_RE to allow leading underscores and keep a single explicit check in _eval_node, or keep the regex as-is but delete the dead startswith("_") check to avoid unreachable code.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@pixeltable-mcp/base-sdk/tools.py`:
- Around line 366-372: The function create_query currently annotates default
None values with implicit Optional types; update the signature to use explicit
Optional annotations by importing Optional from typing and changing the param
annotations to select_columns: Optional[list[str]] = None and where_expr:
Optional[str] = None (leave defaults as None) so the types comply with PEP
484/Ruff RUF013; ensure the import of Optional is added to the module if not
already present and adjust any type-use occurrences accordingly.
- Around line 96-97: The loop using zip(node.ops, node.comparators) in tools.py
should be made explicit about the expected length invariant to satisfy Ruff
B905; update the for-loop inside the comparison handling (where node.ops and
node.comparators are iterated) to use zip(..., strict=True) so mismatched
lengths would raise immediately and the contract is explicit (i.e., replace
zip(node.ops, node.comparators) with zip(node.ops, node.comparators,
strict=True)).
- Around line 73-82: The startswith("_") guard in the ast.Attribute handling of
_eval_node is unreachable because _IDENTIFIER_RE already rejects leading
underscores; remove the redundant "if attr.startswith('_')" branch (and its
ValueError) from the Attribute branch in _eval_node, and if you intended to
explicitly forbid private attributes instead of relying on the regex, either
relax _IDENTIFIER_RE to allow leading underscores and keep a single explicit
check in _eval_node, or keep the regex as-is but delete the dead startswith("_")
check to avoid unreachable code.
Code ReviewOverall: Approve with minor suggestions ✅ Good work on addressing the RCE vulnerability. The AST-based What works well
Remaining minor cleanupPer CodeRabbit's final nitpicks:
QuestionThe test plan mentions manual verification. Is there a plan to add unit tests for # Example test cases
"table.col_a + table.col_b" # ✅ should pass
"table.age >= 18" # ✅ should pass
"__import__('os').system('ls')" # ❌ should reject
"table.__class__.__bases__" # ❌ should reject |
Summary
eval()/exec()calls inpixeltable-mcp/base-sdk/tools.pywith a safe AST-based expression evaluatortable.col), arithmetic/comparison operators, string/number/boolean literals, and parenthesized sub-expressionscreate_queryto accept structured parameters (select_columns,where_expr) instead of arbitrary code strings passed toexec()"table.first_name + ' ' + table.last_name","table.age >= 18")Closes #229
What was the vulnerability?
Functions
add_computed_column,create_view,execute_query, andcreate_querypassed LLM-provided strings directly to Python'seval()andexec(). In an agentic context, prompt injection could manipulate an agent into invoking these tools with payloads like__import__('os').system('...').How is it fixed?
A
_safe_eval()function parses the expression into an AST and walks it, only allowing:ast.Constant(string, number, boolean, None literals)ast.Name(onlytable,True,False,None)ast.Attribute(column access liketable.name)ast.BinOp,ast.Compare,ast.UnaryOp,ast.BoolOp(with an explicit allowlist of operators)Any other AST node type (function calls, imports, subscripts, etc.) raises a
ValueError.Test plan
table.col_a + table.col_bstill evaluate correctlytable.age >= 18work increate_viewandexecute_query__import__('os').system('ls')are rejected with a clear errorcreate_queryworks with the new structured parameters🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Bug Fixes & Improvements
API Change