fix: prevent SQL injection in update_row by using parameterized queries#467
Conversation
| run_key: str(run_value)} | ||
|
|
||
| set_dict = {"tags": QuotedString(json.dumps(run_tags)).getquoted().decode()} | ||
| set_dict = {"tags": json.dumps(run_tags)} |
There was a problem hiding this comment.
QuotedString is an unused import now.
There was a problem hiding this comment.
I’ll remove it in the next commit, sorry I missed that
| update_sql = """ | ||
| UPDATE {0} SET {1} WHERE {2}; | ||
| """.format(self.table_name, set_clause, where_clause) | ||
| values = tuple(set_values + filter_values) |
There was a problem hiding this comment.
this is a bit sensitive to ordering now. It might be a good idea to see if we can combine the construction of the two sets of values into one instead to keep the ordering easier to reason about.
There was a problem hiding this comment.
You're right the ordering is sensitive. I see two approaches to solve this :
1-Build a single list from the start, appending values in the same order they appear in the sql set values first, then where values. The code order mirrors the sql order.
2- Store each placeholder alongside its value as a pair, then unzip them at the end. This makes it impossible to accidentally misorder them.
What do you prefer?
| set_seperator = ", " | ||
| set_clause = "" | ||
| if bool(filter_dict): | ||
| if bool(update_dict): |
There was a problem hiding this comment.
is the bool check actually necessary? wouldn't the join just produce an empty string with an empty set anyway?
There was a problem hiding this comment.
You're right it's not necessary. I will remove it
|
|
||
| seperator = " and " | ||
| where_clause = "" | ||
| if bool(filter_dict): |
There was a problem hiding this comment.
same here, though this is historical cleanup at this point. The conditional shouldn't actually be necessary, as the join will produce an empty string with an empty list.
| seperator = " and " | ||
| where_clause = "" | ||
| if bool(filter_dict): | ||
| where_clause = seperator.join(filters) |
There was a problem hiding this comment.
🟠 Empty update_dict now produces invalid SQL with empty SET clause
Category: null_safety | Severity: high | Confidence: 92%
The code now correctly checks if bool(update_dict): before building set_clause, but it still proceeds to format and execute UPDATE {table} SET {1} WHERE {2}; even when update_dict is empty. In that case, set_clause is an empty string and the generated SQL becomes UPDATE ... SET WHERE ..., which will fail at runtime. This is a defensive-programming issue caused by missing validation of required input before use.
Suggestion: Validate update_dict before building SQL and return an error or raise an exception if it is empty. Example: if not update_dict: raise ValueError('update_dict must not be empty').
PR Guardrail Agent — automated review
| @@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun): | |||
| async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None): | |||
| # generate where clause | |||
| filters = [] | |||
There was a problem hiding this comment.
🟠 Dynamic column/operator interpolation still allows SQL injection via identifiers
Category: db_query | Severity: high | Confidence: 91%
The change correctly parameterizes values passed to the UPDATE statement, which reduces value-based SQL injection risk. However, SQL identifiers and operators are still interpolated directly into SQL from filter_dict and update_dict keys. col_name comes from dictionary keys and is inserted into both the WHERE and SET clauses without validation or quoting. Additionally, an operator may be extracted from col_name via operator_match and interpolated directly. If any part of these inputs can be influenced by untrusted data, an attacker could inject arbitrary SQL through column names or crafted operator suffixes. Parameter binding does not protect SQL identifiers or operators.
Suggestion: Restrict filter and update keys to a fixed allowlist of known column names, and restrict operators to a small explicit allowlist (for example =, !=, <, >, <=, >=). Build SQL using validated identifiers only, ideally with psycopg2.sql.Identifier / SQL composition rather than string formatting.
PR Guardrail Agent — automated review
| find_operator = operator_match.match(col_name) | ||
| if find_operator: | ||
| col_name = find_operator.group(1) | ||
| operator = find_operator.group(2) |
There was a problem hiding this comment.
🟠 Empty filter_dict can generate malformed or unintended UPDATE statement
Category: null_safety | Severity: high | Confidence: 90%
If filter_dict is empty, where_clause remains an empty string, but the SQL still includes WHERE {2}. This produces malformed SQL (... WHERE ;) or, if later refactored to omit WHERE, risks updating all rows. The method does not defensively validate that a filter is present before executing an update. Given historical incidents involving unsafe concurrent updates, missing safeguards around update scope are especially risky.
Suggestion: Require a non-empty filter_dict for updates unless an explicit full-table update mode is intended. Add a guard such as if not filter_dict: raise ValueError('filter_dict must not be empty for update_row').
PR Guardrail Agent — automated review
| @@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun): | |||
| async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None): | |||
There was a problem hiding this comment.
🟡 Mutable default arguments for filter_dict/update_dict can leak state across calls
Category: null_safety | Severity: medium | Confidence: 95%
The function signature uses {} as default values for filter_dict and update_dict. In Python, default mutable objects are created once at function definition time and then shared across invocations. While this method does not currently mutate these dictionaries directly in the shown diff, this remains a defensive-programming gap and can become a source of unexpected behavior if future changes mutate them or if callers rely on isolation between calls. It also weakens null-safety because callers may omit arguments and the function will operate on shared mutable state instead of a fresh default.
Suggestion: Change the signature to async def update_row(self, filter_dict=None, update_dict=None, cur: aiopg.Cursor = None): and initialize with filter_dict = filter_dict or {} / update_dict = update_dict or {} inside the function.
PR Guardrail Agent — automated review
| @@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun): | |||
| async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None): | |||
There was a problem hiding this comment.
🟡 Mutable default arguments in data-access API introduce shared state risk
Category: architecture | Severity: medium | Confidence: 94%
The method signature uses {} as defaults for filter_dict and update_dict. In Python, mutable defaults are shared across invocations. Even if this function does not currently mutate them, this is a known footgun and an architectural smell in a shared DB utility because later changes can accidentally create cross-request state leakage. In async services, hidden shared mutable state is especially risky.
Suggestion: Change defaults to None and initialize inside the function: filter_dict = filter_dict or {} and update_dict = update_dict or {}. Apply the same rule consistently across shared service-layer helpers.
PR Guardrail Agent — automated review
| if bool(filter_dict): | ||
| where_clause = seperator.join(filters) | ||
|
|
||
| sets = [] |
There was a problem hiding this comment.
🟡 UPDATE can become invalid or unsafe when update_dict is empty
Category: db_query | Severity: medium | Confidence: 93%
The patch fixes a previous bug by checking update_dict before building the SET clause, but the function still proceeds even if update_dict is empty. In that case set_clause remains empty and SQL becomes UPDATE table SET WHERE ..., which will fail at runtime. This is a correctness issue and can also trigger unnecessary transaction retries or connection churn under load.
Suggestion: Validate that update_dict is non-empty before generating SQL, and return a clear error if no columns are provided to update.
PR Guardrail Agent — automated review
Refactor AsyncPostgresTable.update_row() to use psycopg2 parameterized queries instead of string concatenation for both SET and WHERE clauses. - Replace direct value interpolation with %s placeholders and separate value lists (set_values, filter_values) passed to cur.execute() - Remove QuotedString wrapper in update_run_tags() to prevent double-escaping now that values are parameterized by the driver - Column names and operators remain as Python string formatting since they originate from hardcoded code, not user input
d33de8e to
a9e75bf
Compare
|
Hi @saikonen I’ve implemented the first approach . |
Summary
Fix critical SQL injection vulnerability in
AsyncPostgresTable.update_row()by replacing string concatenation with psycopg2 parameterized queries.Context / Motivation
The update_row method constructed SQL WHERE and SET clauses using direct string concatenation, allowing potential SQL injection through user-controlled inputs like
flow_id,run_number,step_name, and task_id (which come from URL path parameters). Fixes the SQL injection vulnerability in update_row().Changes Made
%splaceholders and separate value lists (set_values,filter_values) passed tocur.execute()QuotedStringwrapper — it produced pre-quoted SQL literals that caused double-escaping with the now-parameterized update_row(). Values are now passed as plainjson.dumps()and escaped by psycopg2 automaticallyTesting
docker-compose -f docker-compose.test.ymlAsyncRunTablePostgres.update_heartbeat— passes server-generated integers ✅AsyncRunTablePostgres.update_run_tags— now passesjson.dumps()directly ✅AsyncTaskTablePostgres.update_heartbeat— passes server-generated integers ✅