fix: prevent SQL injection in update_row by using parameterized queries by ahmedelgyar1 · Pull Request #467 · Netflix/metaflow-service

ahmedelgyar1 · 2026-03-18T23:49:43Z

Summary

Fix critical SQL injection vulnerability in AsyncPostgresTable.update_row() by replacing string concatenation with psycopg2 parameterized queries.

Context / Motivation

The update_row method constructed SQL WHERE and SET clauses using direct string concatenation, allowing potential SQL injection through user-controlled inputs like flow_id, run_number, step_name, and task_id (which come from URL path parameters). Fixes the SQL injection vulnerability in update_row().

Changes Made

update_row(): Replaced direct value interpolation with %s placeholders and separate value lists (set_values, filter_values) passed to cur.execute()
update_run_tags(): Removed QuotedString wrapper — it produced pre-quoted SQL literals that caused double-escaping with the now-parameterized update_row(). Values are now passed as plain json.dumps() and escaped by psycopg2 automatically
Column names and operators remain as Python string formatting since they originate from hardcoded server code, not user input

Testing

All 207 integration tests pass via docker-compose -f docker-compose.test.yml
Verified all 3 callers of update_row():
- AsyncRunTablePostgres.update_heartbeat — passes server-generated integers ✅
- AsyncRunTablePostgres.update_run_tags — now passes json.dumps() directly ✅
- AsyncTaskTablePostgres.update_heartbeat — passes server-generated integers ✅

saikonen · 2026-03-30T16:01:24Z

                       run_key: str(run_value)}

-        set_dict = {"tags": QuotedString(json.dumps(run_tags)).getquoted().decode()}
+        set_dict = {"tags": json.dumps(run_tags)}


QuotedString is an unused import now.

I’ll remove it in the next commit, sorry I missed that

saikonen · 2026-03-30T16:11:42Z

        update_sql = """
                UPDATE {0} SET {1} WHERE {2};
        """.format(self.table_name, set_clause, where_clause)
+        values = tuple(set_values + filter_values)


this is a bit sensitive to ordering now. It might be a good idea to see if we can combine the construction of the two sets of values into one instead to keep the ordering easier to reason about.

You're right the ordering is sensitive. I see two approaches to solve this :
1-Build a single list from the start, appending values in the same order they appear in the sql set values first, then where values. The code order mirrors the sql order.
2- Store each placeholder alongside its value as a pair, then unzip them at the end. This makes it impossible to accidentally misorder them.
What do you prefer?

saikonen · 2026-03-30T16:17:07Z

        set_seperator = ", "
        set_clause = ""
-        if bool(filter_dict):
+        if bool(update_dict):


is the bool check actually necessary? wouldn't the join just produce an empty string with an empty set anyway?

You're right it's not necessary. I will remove it

saikonen · 2026-03-30T16:18:11Z


        seperator = " and "
        where_clause = ""
        if bool(filter_dict):


same here, though this is historical cleanup at this point. The conditional shouldn't actually be necessary, as the join will produce an empty string with an empty list.

snarnawareEntrata · 2026-04-01T09:12:23Z

        seperator = " and "
        where_clause = ""
        if bool(filter_dict):
            where_clause = seperator.join(filters)


🟠 Empty update_dict now produces invalid SQL with empty SET clause

Category: null_safety | Severity: high | Confidence: 92%

The code now correctly checks if bool(update_dict): before building set_clause, but it still proceeds to format and execute UPDATE {table} SET {1} WHERE {2}; even when update_dict is empty. In that case, set_clause is an empty string and the generated SQL becomes UPDATE ... SET WHERE ..., which will fail at runtime. This is a defensive-programming issue caused by missing validation of required input before use.

Suggestion: Validate update_dict before building SQL and return an error or raise an exception if it is empty. Example: if not update_dict: raise ValueError('update_dict must not be empty').

PR Guardrail Agent — automated review

snarnawareEntrata · 2026-04-01T09:12:23Z

@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
    async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):
        # generate where clause
        filters = []


🟠 Dynamic column/operator interpolation still allows SQL injection via identifiers

Category: db_query | Severity: high | Confidence: 91%

The change correctly parameterizes values passed to the UPDATE statement, which reduces value-based SQL injection risk. However, SQL identifiers and operators are still interpolated directly into SQL from filter_dict and update_dict keys. col_name comes from dictionary keys and is inserted into both the WHERE and SET clauses without validation or quoting. Additionally, an operator may be extracted from col_name via operator_match and interpolated directly. If any part of these inputs can be influenced by untrusted data, an attacker could inject arbitrary SQL through column names or crafted operator suffixes. Parameter binding does not protect SQL identifiers or operators.

Suggestion: Restrict filter and update keys to a fixed allowlist of known column names, and restrict operators to a small explicit allowlist (for example =, !=, <, >, <=, >=). Build SQL using validated identifiers only, ideally with psycopg2.sql.Identifier / SQL composition rather than string formatting.

PR Guardrail Agent — automated review

snarnawareEntrata · 2026-04-01T09:12:23Z

            find_operator = operator_match.match(col_name)
            if find_operator:
                col_name = find_operator.group(1)
                operator = find_operator.group(2)


🟠 Empty filter_dict can generate malformed or unintended UPDATE statement

Category: null_safety | Severity: high | Confidence: 90%

If filter_dict is empty, where_clause remains an empty string, but the SQL still includes WHERE {2}. This produces malformed SQL (... WHERE ;) or, if later refactored to omit WHERE, risks updating all rows. The method does not defensively validate that a filter is present before executing an update. Given historical incidents involving unsafe concurrent updates, missing safeguards around update scope are especially risky.

Suggestion: Require a non-empty filter_dict for updates unless an explicit full-table update mode is intended. Add a guard such as if not filter_dict: raise ValueError('filter_dict must not be empty for update_row').

PR Guardrail Agent — automated review

snarnawareEntrata · 2026-04-01T09:12:23Z

@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
    async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):


🟡 Mutable default arguments for filter_dict/update_dict can leak state across calls

Category: null_safety | Severity: medium | Confidence: 95%

The function signature uses {} as default values for filter_dict and update_dict. In Python, default mutable objects are created once at function definition time and then shared across invocations. While this method does not currently mutate these dictionaries directly in the shown diff, this remains a defensive-programming gap and can become a source of unexpected behavior if future changes mutate them or if callers rely on isolation between calls. It also weakens null-safety because callers may omit arguments and the function will operate on shared mutable state instead of a fresh default.

Suggestion: Change the signature to async def update_row(self, filter_dict=None, update_dict=None, cur: aiopg.Cursor = None): and initialize with filter_dict = filter_dict or {} / update_dict = update_dict or {} inside the function.

PR Guardrail Agent — automated review

snarnawareEntrata · 2026-04-01T09:12:23Z

@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
    async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):


🟡 Mutable default arguments in data-access API introduce shared state risk

Category: architecture | Severity: medium | Confidence: 94%

The method signature uses {} as defaults for filter_dict and update_dict. In Python, mutable defaults are shared across invocations. Even if this function does not currently mutate them, this is a known footgun and an architectural smell in a shared DB utility because later changes can accidentally create cross-request state leakage. In async services, hidden shared mutable state is especially risky.

Suggestion: Change defaults to None and initialize inside the function: filter_dict = filter_dict or {} and update_dict = update_dict or {}. Apply the same rule consistently across shared service-layer helpers.

PR Guardrail Agent — automated review

snarnawareEntrata · 2026-04-01T09:12:23Z

        if bool(filter_dict):
            where_clause = seperator.join(filters)

        sets = []


🟡 UPDATE can become invalid or unsafe when update_dict is empty

Category: db_query | Severity: medium | Confidence: 93%

The patch fixes a previous bug by checking update_dict before building the SET clause, but the function still proceeds even if update_dict is empty. In that case set_clause remains empty and SQL becomes UPDATE table SET WHERE ..., which will fail at runtime. This is a correctness issue and can also trigger unnecessary transaction retries or connection churn under load.

Suggestion: Validate that update_dict is non-empty before generating SQL, and return a clear error if no columns are provided to update.

PR Guardrail Agent — automated review

Refactor AsyncPostgresTable.update_row() to use psycopg2 parameterized queries instead of string concatenation for both SET and WHERE clauses. - Replace direct value interpolation with %s placeholders and separate value lists (set_values, filter_values) passed to cur.execute() - Remove QuotedString wrapper in update_run_tags() to prevent double-escaping now that values are parameterized by the driver - Column names and operators remain as Python string formatting since they originate from hardcoded code, not user input

ahmedelgyar1 · 2026-04-03T02:13:28Z

Hi @saikonen I’ve implemented the first approach .
Could you please take a look and let me know your feedback?
Also, if you think the second approach would be better, I can switch to it.

saikonen reviewed Mar 30, 2026

View reviewed changes

snarnawareEntrata reviewed Apr 1, 2026

View reviewed changes

ahmedelgyar1 force-pushed the fix/sql-injection-update-row-parameterized-queries branch from d33de8e to a9e75bf Compare April 3, 2026 02:02

		@@ -363,39 +363,41 @@ async def run_in_transaction_with_serializable_isolation_level(self, fun):
		async def update_row(self, filter_dict={}, update_dict={}, cur: aiopg.Cursor = None):

Conversation

ahmedelgyar1 commented Mar 18, 2026

Summary

Context / Motivation

Changes Made

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmedelgyar1 Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ahmedelgyar1 Mar 31, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

🟠 Empty update_dict now produces invalid SQL with empty SET clause

Uh oh!

Choose a reason for hiding this comment

🟠 Dynamic column/operator interpolation still allows SQL injection via identifiers

Uh oh!

Choose a reason for hiding this comment

🟠 Empty filter_dict can generate malformed or unintended UPDATE statement

Uh oh!

Choose a reason for hiding this comment

🟡 Mutable default arguments for filter_dict/update_dict can leak state across calls

Uh oh!

Choose a reason for hiding this comment

🟡 Mutable default arguments in data-access API introduce shared state risk

Uh oh!

Choose a reason for hiding this comment

🟡 UPDATE can become invalid or unsafe when update_dict is empty

Uh oh!

ahmedelgyar1 commented Apr 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ahmedelgyar1 Mar 31, 2026 •

edited

Loading

ahmedelgyar1 Mar 31, 2026 •

edited

Loading