Skip to content

Adding python schema check#50

Open
daticalamy wants to merge 2 commits into
mainfrom
schema_check
Open

Adding python schema check#50
daticalamy wants to merge 2 commits into
mainfrom
schema_check

Conversation

@daticalamy

@daticalamy daticalamy commented May 7, 2026

Copy link
Copy Markdown
Contributor

Updated schema_check.py as it was not capturing eg. UPDATE DirContact SET DirCompanyId = @PodId…

@coderabbitai

coderabbitai Bot commented May 7, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: b0b84191-a984-4751-a0ec-0601e9df0dab

📥 Commits

Reviewing files that changed from the base of the PR and between 2eeceeb and 4f95dbf.

📒 Files selected for processing (1)
  • Python/Scripts/Any/schema_check.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • Python/Scripts/Any/schema_check.py

📝 Walkthrough

Walkthrough

Adds SchemaCheck: a Liquibase custom check implemented in Python that parses generated SQL to detect unqualified object references, integrates with Liquibase to evaluate changesets and fail on violations, and documents the check in the repository README.

Changes

SchemaCheck Custom Check Implementation

Layer / File(s) Summary
Core validation functions
Python/Scripts/Any/schema_check.py
Module imports and implementation of extract_object_identifiers() and check_schema_qualification() to parse SQL tokens, pre-collect aliases/cursors, identify object candidates after keywords (FROM, JOIN, INTO, EXEC, TABLE, VIEW, CREATE/ALTER, DML), and record missing-schema violations while ignoring temp/variable/sys objects.
Liquibase integration & execution
Python/Scripts/Any/schema_check.py
Top-level flow retrieves current Liquibase changes, skips changes with loaddatachange class, generates raw SQL for each change, runs schema qualification checks, logs and updates Liquibase status on the first violation, and exits with code 1; otherwise returns False.
Configuration documentation
Python/Scripts/Any/README.md
Adds README entry for SchemaCheck including configuration command, target (Relational), severity (0-4), description enforcing schema-qualified referenced objects, and metadata (Type=python, Path=Scripts/schema_check.py, Snapshot=false).
sequenceDiagram
  participant Liquibase
  participant SchemaCheckScript
  participant ValidationLogic
  Liquibase->>SchemaCheckScript: invoke custom check
  SchemaCheckScript->>SchemaCheckScript: fetch current changeset
  loop per change
    SchemaCheckScript->>SchemaCheckScript: skip if loaddatachange
    SchemaCheckScript->>SchemaCheckScript: generate raw SQL
    SchemaCheckScript->>ValidationLogic: check_schema_qualification(sql)
    ValidationLogic-->>SchemaCheckScript: violations list
    alt violation found
      SchemaCheckScript->>Liquibase: update status / log warning
      SchemaCheckScript->>SchemaCheckScript: exit(1)
    else no violation
      SchemaCheckScript->>SchemaCheckScript: continue
    end
  end
  SchemaCheckScript->>Liquibase: return False
Loading

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Adding python schema check' directly describes the main change: a new Python script (schema_check.py) that validates schema-qualified SQL object references.
Description check ✅ Passed The description explains that schema_check.py was updated to capture previously missed patterns like UPDATE statements, which aligns with the changeset's focus on improving schema validation.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch schema_check

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
Python/Scripts/Any/schema_check.py (1)

59-62: ⚡ Quick win

IdentifierList not handled — comma-join syntax produces silent false negatives.

FROM t1, t2 gives an IdentifierList as the next token. It is not an Identifier, so the isinstance branch misses it; its ttype is None, so the elif wraps the whole list into a synthetic Identifier([IdentifierList]) whose get_real_name() is unreliable. Both table references are silently skipped.

♻️ Proposed fix — handle `IdentifierList`
+from sqlparse.sql import Identifier, IdentifierList, Parenthesis
 ...
         if isinstance(next_token, Identifier):
+            first = next_token.token_first(skip_ws=True, skip_cm=True)
+            if isinstance(first, Parenthesis):
+                continue
             identifiers.append(next_token)
+        elif isinstance(next_token, IdentifierList):
+            for ident in next_token.get_identifiers():
+                if isinstance(ident, Identifier):
+                    identifiers.append(ident)
         elif next_token.ttype in (Name, None):
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Python/Scripts/Any/schema_check.py` around lines 59 - 62, The code currently
treats tokens with ttype None as Identifiers and wraps IdentifierList objects
into a synthetic Identifier, causing multi-table comma syntax to be missed;
update the token handling in schema_check.py around the next_token logic to
explicitly check for isinstance(next_token, IdentifierList) and, when true,
iterate its .get_identifiers() (or children) to extend the identifiers list with
each real Identifier (so their get_real_name() works), rather than appending
Identifier([next_token]); keep the existing Identifier branch for single
Identifiers and preserve the identifiers list usage.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Python/Scripts/Any/schema_check.py`:
- Around line 76-77: The current check skips CTEs because the stmt_str regex
only accepts statements starting with SELECT/INSERT/...; update that regex to
include WITH so CTE-led statements are inspected, and implement CTE alias
tracking: when processing stmt_str, parse leading CTE definitions (e.g., match
"WITH\s+([^(\s,]+)\s+AS" repeatedly) to collect alias names, then when your
unqualified-table detection logic (the routine that scans FROM/JOIN table
identifiers in the same scope as stmt_str) runs, exclude any table references
that match the collected CTE aliases to avoid false positives; refer to the
existing re.match check on stmt_str and the variable holding the statement text
(stmt_str) when adding the CTE parsing and exclusion logic.
- Around line 111-116: The code only reports violations[0] then exits; change it
to report all violations returned by check_schema_qualification by joining or
iterating over the violations list and including every unqualified object in the
log and the liquibase_status.message before exiting; update the block that
references violations, liquibase_logger, and liquibase_status (the same scope
where violations is set by check_schema_qualification) so the warning message
contains all items (e.g., comma-separated or multiline) and still sets
liquibase_status.fired = True and sys.exit(1).
- Around line 57-58: The unqualified-alias checker is flagging subquery-derived
aliases because sqlparse wraps "FROM (SELECT ...) AS t" as an Identifier whose
first meaningful child is a Parenthesis; modify the handling where next_token is
an Identifier to detect the identifier's first meaningful child (e.g., inspect
identifier.tokens or use identifier.get_real_name()/first_token) and skip
further qualification checks if that child is a Parenthesis (i.e., it's a
subquery-derived table), so the existing get_parent_name() / alias dot checks
don't run for subquery aliases like "t".

---

Nitpick comments:
In `@Python/Scripts/Any/schema_check.py`:
- Around line 59-62: The code currently treats tokens with ttype None as
Identifiers and wraps IdentifierList objects into a synthetic Identifier,
causing multi-table comma syntax to be missed; update the token handling in
schema_check.py around the next_token logic to explicitly check for
isinstance(next_token, IdentifierList) and, when true, iterate its
.get_identifiers() (or children) to extend the identifiers list with each real
Identifier (so their get_real_name() works), rather than appending
Identifier([next_token]); keep the existing Identifier branch for single
Identifiers and preserve the identifiers list usage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 567e64cd-e72c-4f74-a783-2750e473d2d7

📥 Commits

Reviewing files that changed from the base of the PR and between 7e018d1 and 2eeceeb.

📒 Files selected for processing (2)
  • Python/Scripts/Any/README.md
  • Python/Scripts/Any/schema_check.py

Comment thread Python/Scripts/Any/schema_check.py
Comment on lines +76 to +77
if not re.match(r"^\s*(SELECT|INSERT|UPDATE|DELETE|EXEC|EXECUTE|CREATE|ALTER)\b", stmt_str, re.IGNORECASE):
continue

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

CTE queries (WITH ... AS (...)) are silently skipped — no schema check applied.

The regex requires statements to start with SELECT/INSERT/etc., so any WITH-prefixed CTE entirely bypasses the check. Unqualified table references inside the CTE body (e.g., WITH cte AS (SELECT * FROM UnqualifiedTable)) go unreported.

Adding WITH to the regex is trivial; the harder part is tracking the CTE alias names (cte, etc.) and excluding them from violation detection in the outer query's FROM cte reference to avoid introducing false positives.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Python/Scripts/Any/schema_check.py` around lines 76 - 77, The current check
skips CTEs because the stmt_str regex only accepts statements starting with
SELECT/INSERT/...; update that regex to include WITH so CTE-led statements are
inspected, and implement CTE alias tracking: when processing stmt_str, parse
leading CTE definitions (e.g., match "WITH\s+([^(\s,]+)\s+AS" repeatedly) to
collect alias names, then when your unqualified-table detection logic (the
routine that scans FROM/JOIN table identifiers in the same scope as stmt_str)
runs, exclude any table references that match the collected CTE aliases to avoid
false positives; refer to the existing re.match check on stmt_str and the
variable holding the statement text (stmt_str) when adding the CTE parsing and
exclusion logic.

Comment on lines +111 to +116
if violations:
msg = f"Missing schema for object '{violations[0]}' in SQL statement."
liquibase_logger.warning(msg)
liquibase_status.fired = True
liquibase_status.message = msg
sys.exit(1)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Only the first violation per changeset is reported; remaining violations are silently dropped.

Even though check_schema_qualification returns all unqualified names, the outer loop breaks on the first violating change using violations[0]. A developer with three unqualified tables must fix-and-re-run three times to discover all of them.

💡 Proposed improvement — report all violations before exiting
-    if violations:
-        msg = f"Missing schema for object '{violations[0]}' in SQL statement."
-        liquibase_logger.warning(msg)
-        liquibase_status.fired = True
-        liquibase_status.message = msg
-        sys.exit(1)
+    for v in violations:
+        liquibase_logger.warning(f"Missing schema for object '{v}' in SQL statement.")
+    if violations:
+        msg = f"Missing schema qualification: {', '.join(violations)}"
+        liquibase_status.fired = True
+        liquibase_status.message = msg
+        sys.exit(1)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Python/Scripts/Any/schema_check.py` around lines 111 - 116, The code only
reports violations[0] then exits; change it to report all violations returned by
check_schema_qualification by joining or iterating over the violations list and
including every unqualified object in the log and the liquibase_status.message
before exiting; update the block that references violations, liquibase_logger,
and liquibase_status (the same scope where violations is set by
check_schema_qualification) so the warning message contains all items (e.g.,
comma-separated or multiline) and still sets liquibase_status.fired = True and
sys.exit(1).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant