Adding python schema check by daticalamy · Pull Request #50 · liquibase/custom_policychecks

daticalamy · 2026-05-07T14:50:44Z

Updated schema_check.py as it was not capturing eg. UPDATE DirContact SET DirCompanyId = @PodId…

coderabbitai · 2026-05-07T14:51:13Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0b84191-a984-4751-a0ec-0601e9df0dab

📥 Commits

Reviewing files that changed from the base of the PR and between 2eeceeb and 4f95dbf.

📒 Files selected for processing (1)

Python/Scripts/Any/schema_check.py

🚧 Files skipped from review as they are similar to previous changes (1)

Python/Scripts/Any/schema_check.py

📝 Walkthrough

Walkthrough

Adds SchemaCheck: a Liquibase custom check implemented in Python that parses generated SQL to detect unqualified object references, integrates with Liquibase to evaluate changesets and fail on violations, and documents the check in the repository README.

Changes

SchemaCheck Custom Check Implementation

Layer / File(s)	Summary
Core validation functions `Python/Scripts/Any/schema_check.py`	Module imports and implementation of `extract_object_identifiers()` and `check_schema_qualification()` to parse SQL tokens, pre-collect aliases/cursors, identify object candidates after keywords (FROM, JOIN, INTO, EXEC, TABLE, VIEW, CREATE/ALTER, DML), and record missing-schema violations while ignoring temp/variable/sys objects.
Liquibase integration & execution `Python/Scripts/Any/schema_check.py`	Top-level flow retrieves current Liquibase changes, skips changes with `loaddatachange` class, generates raw SQL for each change, runs schema qualification checks, logs and updates Liquibase status on the first violation, and exits with code 1; otherwise returns False.
Configuration documentation `Python/Scripts/Any/README.md`	Adds README entry for `SchemaCheck` including configuration command, target (Relational), severity (`0-4`), description enforcing schema-qualified referenced objects, and metadata (`Type=python`, `Path=Scripts/schema_check.py`, `Snapshot=false`).

sequenceDiagram
  participant Liquibase
  participant SchemaCheckScript
  participant ValidationLogic
  Liquibase->>SchemaCheckScript: invoke custom check
  SchemaCheckScript->>SchemaCheckScript: fetch current changeset
  loop per change
    SchemaCheckScript->>SchemaCheckScript: skip if loaddatachange
    SchemaCheckScript->>SchemaCheckScript: generate raw SQL
    SchemaCheckScript->>ValidationLogic: check_schema_qualification(sql)
    ValidationLogic-->>SchemaCheckScript: violations list
    alt violation found
      SchemaCheckScript->>Liquibase: update status / log warning
      SchemaCheckScript->>SchemaCheckScript: exit(1)
    else no violation
      SchemaCheckScript->>SchemaCheckScript: continue
    end
  end
  SchemaCheckScript->>Liquibase: return False

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'Adding python schema check' directly describes the main change: a new Python script (schema_check.py) that validates schema-qualified SQL object references.
Description check	✅ Passed	The description explains that schema_check.py was updated to capture previously missed patterns like UPDATE statements, which aligns with the changeset's focus on improving schema validation.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch schema_check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

Python/Scripts/Any/schema_check.py (1)

59-62: ⚡ Quick win

IdentifierList not handled — comma-join syntax produces silent false negatives.

FROM t1, t2 gives an IdentifierList as the next token. It is not an Identifier, so the isinstance branch misses it; its ttype is None, so the elif wraps the whole list into a synthetic Identifier([IdentifierList]) whose get_real_name() is unreliable. Both table references are silently skipped.

♻️ Proposed fix — handle `IdentifierList`

+from sqlparse.sql import Identifier, IdentifierList, Parenthesis
 ...
         if isinstance(next_token, Identifier):
+            first = next_token.token_first(skip_ws=True, skip_cm=True)
+            if isinstance(first, Parenthesis):
+                continue
             identifiers.append(next_token)
+        elif isinstance(next_token, IdentifierList):
+            for ident in next_token.get_identifiers():
+                if isinstance(ident, Identifier):
+                    identifiers.append(ident)
         elif next_token.ttype in (Name, None):

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Python/Scripts/Any/schema_check.py` around lines 59 - 62, The code currently
treats tokens with ttype None as Identifiers and wraps IdentifierList objects
into a synthetic Identifier, causing multi-table comma syntax to be missed;
update the token handling in schema_check.py around the next_token logic to
explicitly check for isinstance(next_token, IdentifierList) and, when true,
iterate its .get_identifiers() (or children) to extend the identifiers list with
each real Identifier (so their get_real_name() works), rather than appending
Identifier([next_token]); keep the existing Identifier branch for single
Identifiers and preserve the identifiers list usage.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Python/Scripts/Any/schema_check.py`:
- Around line 76-77: The current check skips CTEs because the stmt_str regex
only accepts statements starting with SELECT/INSERT/...; update that regex to
include WITH so CTE-led statements are inspected, and implement CTE alias
tracking: when processing stmt_str, parse leading CTE definitions (e.g., match
"WITH\s+([^(\s,]+)\s+AS" repeatedly) to collect alias names, then when your
unqualified-table detection logic (the routine that scans FROM/JOIN table
identifiers in the same scope as stmt_str) runs, exclude any table references
that match the collected CTE aliases to avoid false positives; refer to the
existing re.match check on stmt_str and the variable holding the statement text
(stmt_str) when adding the CTE parsing and exclusion logic.
- Around line 111-116: The code only reports violations[0] then exits; change it
to report all violations returned by check_schema_qualification by joining or
iterating over the violations list and including every unqualified object in the
log and the liquibase_status.message before exiting; update the block that
references violations, liquibase_logger, and liquibase_status (the same scope
where violations is set by check_schema_qualification) so the warning message
contains all items (e.g., comma-separated or multiline) and still sets
liquibase_status.fired = True and sys.exit(1).
- Around line 57-58: The unqualified-alias checker is flagging subquery-derived
aliases because sqlparse wraps "FROM (SELECT ...) AS t" as an Identifier whose
first meaningful child is a Parenthesis; modify the handling where next_token is
an Identifier to detect the identifier's first meaningful child (e.g., inspect
identifier.tokens or use identifier.get_real_name()/first_token) and skip
further qualification checks if that child is a Parenthesis (i.e., it's a
subquery-derived table), so the existing get_parent_name() / alias dot checks
don't run for subquery aliases like "t".

---

Nitpick comments:
In `@Python/Scripts/Any/schema_check.py`:
- Around line 59-62: The code currently treats tokens with ttype None as
Identifiers and wraps IdentifierList objects into a synthetic Identifier,
causing multi-table comma syntax to be missed; update the token handling in
schema_check.py around the next_token logic to explicitly check for
isinstance(next_token, IdentifierList) and, when true, iterate its
.get_identifiers() (or children) to extend the identifiers list with each real
Identifier (so their get_real_name() works), rather than appending
Identifier([next_token]); keep the existing Identifier branch for single
Identifiers and preserve the identifiers list usage.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 567e64cd-e72c-4f74-a783-2750e473d2d7

📥 Commits

Reviewing files that changed from the base of the PR and between 7e018d1 and 2eeceeb.

📒 Files selected for processing (2)

Python/Scripts/Any/README.md
Python/Scripts/Any/schema_check.py

coderabbitai · 2026-05-07T14:59:29Z

+        if not re.match(r"^\s*(SELECT|INSERT|UPDATE|DELETE|EXEC|EXECUTE|CREATE|ALTER)\b", stmt_str, re.IGNORECASE):
+            continue


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

CTE queries (WITH ... AS (...)) are silently skipped — no schema check applied.

The regex requires statements to start with SELECT/INSERT/etc., so any WITH-prefixed CTE entirely bypasses the check. Unqualified table references inside the CTE body (e.g., WITH cte AS (SELECT * FROM UnqualifiedTable)) go unreported.

Adding WITH to the regex is trivial; the harder part is tracking the CTE alias names (cte, etc.) and excluding them from violation detection in the outer query's FROM cte reference to avoid introducing false positives.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Python/Scripts/Any/schema_check.py` around lines 76 - 77, The current check skips CTEs because the stmt_str regex only accepts statements starting with SELECT/INSERT/...; update that regex to include WITH so CTE-led statements are inspected, and implement CTE alias tracking: when processing stmt_str, parse leading CTE definitions (e.g., match "WITH\s+([^(\s,]+)\s+AS" repeatedly) to collect alias names, then when your unqualified-table detection logic (the routine that scans FROM/JOIN table identifiers in the same scope as stmt_str) runs, exclude any table references that match the collected CTE aliases to avoid false positives; refer to the existing re.match check on stmt_str and the variable holding the statement text (stmt_str) when adding the CTE parsing and exclusion logic.

coderabbitai · 2026-05-07T14:59:29Z

+    if violations:
+        msg = f"Missing schema for object '{violations[0]}' in SQL statement."
+        liquibase_logger.warning(msg)
+        liquibase_status.fired = True
+        liquibase_status.message = msg
+        sys.exit(1)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Only the first violation per changeset is reported; remaining violations are silently dropped.

Even though check_schema_qualification returns all unqualified names, the outer loop breaks on the first violating change using violations[0]. A developer with three unqualified tables must fix-and-re-run three times to discover all of them.

💡 Proposed improvement — report all violations before exiting

- if violations: - msg = f"Missing schema for object '{violations[0]}' in SQL statement." - liquibase_logger.warning(msg) - liquibase_status.fired = True - liquibase_status.message = msg - sys.exit(1) + for v in violations: + liquibase_logger.warning(f"Missing schema for object '{v}' in SQL statement.") + if violations: + msg = f"Missing schema qualification: {', '.join(violations)}" + liquibase_status.fired = True + liquibase_status.message = msg + sys.exit(1)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Python/Scripts/Any/schema_check.py` around lines 111 - 116, The code only reports violations[0] then exits; change it to report all violations returned by check_schema_qualification by joining or iterating over the violations list and including every unqualified object in the log and the liquibase_status.message before exiting; update the block that references violations, liquibase_logger, and liquibase_status (the same scope where violations is set by check_schema_qualification) so the warning message contains all items (e.g., comma-separated or multiline) and still sets liquibase_status.fired = True and sys.exit(1).

Adding python schema check

2eeceeb

coderabbitai Bot reviewed May 7, 2026

View reviewed changes

Adding corrections for aliases, cursors, and sub-queries.

4f95dbf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Adding python schema check#50

Adding python schema check#50
daticalamy wants to merge 2 commits into
mainfrom
schema_check

daticalamy commented May 7, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented May 7, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

coderabbitai Bot May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		if not re.match(r"^\s*(SELECT\|INSERT\|UPDATE\|DELETE\|EXEC\|EXECUTE\|CREATE\|ALTER)\b", stmt_str, re.IGNORECASE):
		continue

Conversation

daticalamy commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

daticalamy commented May 7, 2026 •

edited

Loading

coderabbitai Bot commented May 7, 2026 •

edited

Loading