Skip to content

fix: prevent SQL injection in update_row by using parameterized queries#467

Open
ahmedelgyar1 wants to merge 1 commit intoNetflix:masterfrom
ahmedelgyar1:fix/sql-injection-update-row-parameterized-queries
Open

fix: prevent SQL injection in update_row by using parameterized queries#467
ahmedelgyar1 wants to merge 1 commit intoNetflix:masterfrom
ahmedelgyar1:fix/sql-injection-update-row-parameterized-queries

Conversation

@ahmedelgyar1
Copy link
Copy Markdown

Summary

Fix critical SQL injection vulnerability in AsyncPostgresTable.update_row() by replacing string concatenation with psycopg2 parameterized queries.

Context / Motivation

The update_row method constructed SQL WHERE and SET clauses using direct string concatenation, allowing potential SQL injection through user-controlled inputs like flow_id, run_number, step_name, and task_id (which come from URL path parameters). Fixes the SQL injection vulnerability in update_row().

Changes Made

  • update_row(): Replaced direct value interpolation with %s placeholders and separate value lists (set_values, filter_values) passed to cur.execute()
  • update_run_tags(): Removed QuotedString wrapper — it produced pre-quoted SQL literals that caused double-escaping with the now-parameterized update_row(). Values are now passed as plain json.dumps() and escaped by psycopg2 automatically
  • Column names and operators remain as Python string formatting since they originate from hardcoded server code, not user input

Testing

  • All 207 integration tests pass via docker-compose -f docker-compose.test.yml
  • Verified all 3 callers of update_row():
    • AsyncRunTablePostgres.update_heartbeat — passes server-generated integers ✅
    • AsyncRunTablePostgres.update_run_tags — now passes json.dumps() directly ✅
    • AsyncTaskTablePostgres.update_heartbeat — passes server-generated integers ✅

run_key: str(run_value)}

set_dict = {"tags": QuotedString(json.dumps(run_tags)).getquoted().decode()}
set_dict = {"tags": json.dumps(run_tags)}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

QuotedString is an unused import now.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’ll remove it in the next commit, sorry I missed that

Comment thread services/data/postgres_async_db.py Outdated
update_sql = """
UPDATE {0} SET {1} WHERE {2};
""".format(self.table_name, set_clause, where_clause)
values = tuple(set_values + filter_values)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit sensitive to ordering now. It might be a good idea to see if we can combine the construction of the two sets of values into one instead to keep the ordering easier to reason about.

Copy link
Copy Markdown
Author

@ahmedelgyar1 ahmedelgyar1 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right the ordering is sensitive. I see two approaches to solve this :
1-Build a single list from the start, appending values in the same order they appear in the sql set values first, then where values. The code order mirrors the sql order.
2- Store each placeholder alongside its value as a pair, then unzip them at the end. This makes it impossible to accidentally misorder them.
What do you prefer?

Comment thread services/data/postgres_async_db.py Outdated
set_seperator = ", "
set_clause = ""
if bool(filter_dict):
if bool(update_dict):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is the bool check actually necessary? wouldn't the join just produce an empty string with an empty set anyway?

Copy link
Copy Markdown
Author

@ahmedelgyar1 ahmedelgyar1 Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right it's not necessary. I will remove it

Comment thread services/data/postgres_async_db.py Outdated

seperator = " and "
where_clause = ""
if bool(filter_dict):
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, though this is historical cleanup at this point. The conditional shouldn't actually be necessary, as the join will produce an empty string with an empty list.

Comment thread services/data/postgres_async_db.py Outdated
seperator = " and "
where_clause = ""
if bool(filter_dict):
where_clause = seperator.join(filters)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Empty update_dict now produces invalid SQL with empty SET clause

Category: null_safety | Severity: high | Confidence: 92%

The code now correctly checks if bool(update_dict): before building set_clause, but it still proceeds to format and execute UPDATE {table} SET {1} WHERE {2}; even when update_dict is empty. In that case, set_clause is an empty string and the generated SQL becomes UPDATE ... SET WHERE ..., which will fail at runtime. This is a defensive-programming issue caused by missing validation of required input before use.

Suggestion: Validate update_dict before building SQL and return an error or raise an exception if it is empty. Example: if not update_dict: raise ValueError('update_dict must not be empty').


PR Guardrail Agent — automated review

@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):
# generate where clause
filters = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Dynamic column/operator interpolation still allows SQL injection via identifiers

Category: db_query | Severity: high | Confidence: 91%

The change correctly parameterizes values passed to the UPDATE statement, which reduces value-based SQL injection risk. However, SQL identifiers and operators are still interpolated directly into SQL from filter_dict and update_dict keys. col_name comes from dictionary keys and is inserted into both the WHERE and SET clauses without validation or quoting. Additionally, an operator may be extracted from col_name via operator_match and interpolated directly. If any part of these inputs can be influenced by untrusted data, an attacker could inject arbitrary SQL through column names or crafted operator suffixes. Parameter binding does not protect SQL identifiers or operators.

Suggestion: Restrict filter and update keys to a fixed allowlist of known column names, and restrict operators to a small explicit allowlist (for example =, !=, <, >, <=, >=). Build SQL using validated identifiers only, ideally with psycopg2.sql.Identifier / SQL composition rather than string formatting.


PR Guardrail Agent — automated review

find_operator = operator_match.match(col_name)
if find_operator:
col_name = find_operator.group(1)
operator = find_operator.group(2)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟠 Empty filter_dict can generate malformed or unintended UPDATE statement

Category: null_safety | Severity: high | Confidence: 90%

If filter_dict is empty, where_clause remains an empty string, but the SQL still includes WHERE {2}. This produces malformed SQL (... WHERE ;) or, if later refactored to omit WHERE, risks updating all rows. The method does not defensively validate that a filter is present before executing an update. Given historical incidents involving unsafe concurrent updates, missing safeguards around update scope are especially risky.

Suggestion: Require a non-empty filter_dict for updates unless an explicit full-table update mode is intended. Add a guard such as if not filter_dict: raise ValueError('filter_dict must not be empty for update_row').


PR Guardrail Agent — automated review

@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Mutable default arguments for filter_dict/update_dict can leak state across calls

Category: null_safety | Severity: medium | Confidence: 95%

The function signature uses {} as default values for filter_dict and update_dict. In Python, default mutable objects are created once at function definition time and then shared across invocations. While this method does not currently mutate these dictionaries directly in the shown diff, this remains a defensive-programming gap and can become a source of unexpected behavior if future changes mutate them or if callers rely on isolation between calls. It also weakens null-safety because callers may omit arguments and the function will operate on shared mutable state instead of a fresh default.

Suggestion: Change the signature to async def update_row(self, filter_dict=None, update_dict=None, cur: aiopg.Cursor = None): and initialize with filter_dict = filter_dict or {} / update_dict = update_dict or {} inside the function.


PR Guardrail Agent — automated review

@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Mutable default arguments in data-access API introduce shared state risk

Category: architecture | Severity: medium | Confidence: 94%

The method signature uses {} as defaults for filter_dict and update_dict. In Python, mutable defaults are shared across invocations. Even if this function does not currently mutate them, this is a known footgun and an architectural smell in a shared DB utility because later changes can accidentally create cross-request state leakage. In async services, hidden shared mutable state is especially risky.

Suggestion: Change defaults to None and initialize inside the function: filter_dict = filter_dict or {} and update_dict = update_dict or {}. Apply the same rule consistently across shared service-layer helpers.


PR Guardrail Agent — automated review

Comment thread services/data/postgres_async_db.py Outdated
if bool(filter_dict):
where_clause = seperator.join(filters)

sets = []
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 UPDATE can become invalid or unsafe when update_dict is empty

Category: db_query | Severity: medium | Confidence: 93%

The patch fixes a previous bug by checking update_dict before building the SET clause, but the function still proceeds even if update_dict is empty. In that case set_clause remains empty and SQL becomes UPDATE table SET WHERE ..., which will fail at runtime. This is a correctness issue and can also trigger unnecessary transaction retries or connection churn under load.

Suggestion: Validate that update_dict is non-empty before generating SQL, and return a clear error if no columns are provided to update.


PR Guardrail Agent — automated review

Refactor AsyncPostgresTable.update_row() to use psycopg2 parameterized
queries instead of string concatenation for both SET and WHERE clauses.

- Replace direct value interpolation with %s placeholders and separate
  value lists (set_values, filter_values) passed to cur.execute()
- Remove QuotedString wrapper in update_run_tags() to prevent
  double-escaping now that values are parameterized by the driver
- Column names and operators remain as Python string formatting since
  they originate from hardcoded code, not user input
@ahmedelgyar1 ahmedelgyar1 force-pushed the fix/sql-injection-update-row-parameterized-queries branch from d33de8e to a9e75bf Compare April 3, 2026 02:02
@ahmedelgyar1
Copy link
Copy Markdown
Author

Hi @saikonen I’ve implemented the first approach .
Could you please take a look and let me know your feedback?
Also, if you think the second approach would be better, I can switch to it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants