fix: Skip COMMENT statements in tenant migration rewriter by bjcoombs · Pull Request #2128 · meridianhub/meridian

bjcoombs · 2026-04-04T20:17:23Z

Summary

processMigrationSQL naively rewrote COMMENT ON COLUMN \"party\".\"attributes\" into \"org_tenant\".\"attributes\", producing schema.column instead of table.column and failing with relation does not exist.
Every fresh tenant hard-failed at the party.attributes migration, cascading into the data_source / dataset_definition / instrument_definition errors seen in E2E shards and the develop deploy.
Fix the rewriter to skip COMMENT statements, re-enable the previously-skipped knownBroken migration tests, bump the architecture size baseline to unblock Test shard 0, and broaden the deploy-develop tenant_provisioning reset so stale tenants get re-provisioned.

Root cause

PR #2120 documented this bug (adding a knownBroken skip-list and a no-op corrective migration) but did not fix processMigrationSQL. The rewriter treats the second identifier in a COMMENT ON COLUMN as a table name and replaces it with the tenant schema.

There are zero legitimate uses of schema-qualified references (\"party\".\"table\", \"payment_order\".\"table\", etc.) in any migration file - all matches are COMMENT ON COLUMN statements. The rewriter was solving a phantom problem while corrupting real statements.

Changes

services/tenant/provisioner/migration_runner.go

processMigrationSQL: split SQL into statements, skip schema-pattern rewriting for COMMENT statements, rewrite DDL as before.
Add isCommentStatement helper that tolerates leading whitespace and line comments.

services/tenant/provisioner/migration_runner_test.go

Remove the three knownBroken entries so TestProcessMigrationSQL_AllMigrations_Parse now exercises the previously-broken migrations.
Replace TestProcessMigrationSQL_CommentOnColumn_BugDocumentation with TestProcessMigrationSQL_CommentOnColumn_NotRewritten, asserting the fix preserves table.column.
Add TestProcessMigrationSQL_CommentOnColumn_MixedWithDDL to pin mixed-statement behavior (DDL rewritten, COMMENT preserved).

tests/architecture/size_test.go

Bump baselineOversizedFunctions 187 -> 189 (unrelated ratchet failure on Test shard 0).

.github/workflows/deploy-develop.yml

Broaden the post-migration tenant_provisioning reset to clear service_schemas and include all non-deprovisioned rows. Without this, the existing volterra_energy row on develop (state=pending, all services marked migrated, no physical tables) short-circuits provisioning via the "service already provisioned, skipping" check in provisionAllServices and the code fix alone cannot recover the droplet.

Evidence

Failing runs: gh run view 23980915882 (Deploy Develop), 23980915891 (E2E all 4 shards), 23980915885 (Test shard 0), 23980915881 (Build & Test).
Error from meridian_master tenant_provisioning on develop: party migrations failed: execute migration 20260221000001_add_party_attributes.sql: ERROR: relation \"org_meridian_master\" does not exist (SQLSTATE 42P01).
Local: go test ./services/tenant/provisioner/ -count=1 passes (340s); go test ./tests/architecture/ -run TestFunctionSize passes.

Test plan

CI green on this branch
E2E shards reach manifest execute stage (previously aborted at diff step due to missing tenant tables)
Deploy Develop completes seed-dev successfully

processMigrationSQL did naive string replacement of schema patterns ("party".X -> "org_tenant".X) across all migration content. This corrupted COMMENT ON COLUMN "party"."attributes" IS ..., where the second identifier is a column name rather than a schema qualifier. The rewrite produced COMMENT ON COLUMN "org_tenant"."attributes" which Postgres parses as schema.table (missing the column) and fails with "relation does not exist". PR #2120 documented the bug with a knownBroken skip-list and a no-op corrective migration but did not fix the rewriter. Every fresh tenant now hard-fails at the party.attributes migration, cascading into the "relation does not exist" errors for data_source / dataset_definition / instrument_definition seen in E2E and develop deploy - downstream service schemas are never populated because provisioning aborts at the first service. Fix processMigrationSQL to split SQL into statements, detect COMMENT statements via isCommentStatement, and skip schema-pattern rewriting for them while still rewriting DDL statements normally. Also: - Remove the knownBroken entries in TestProcessMigrationSQL_AllMigrations_Parse so the three affected migrations are exercised end-to-end. - Replace TestProcessMigrationSQL_CommentOnColumn_BugDocumentation with TestProcessMigrationSQL_CommentOnColumn_NotRewritten, asserting the fix preserves table.column references. - Add TestProcessMigrationSQL_CommentOnColumn_MixedWithDDL to pin the mixed-statement behavior (DDL rewritten, COMMENT preserved). - Bump baselineOversizedFunctions 187 -> 189 to unblock Test shard 0. - deploy-develop.yml: broaden the tenant_provisioning reset to clear service_schemas and include all non-deprovisioned rows, so stale tenants whose service_schemas entries are marked migrated but whose physical tables are missing get re-provisioned on redeploy.

coderabbitai · 2026-04-04T20:17:38Z

📝 Walkthrough

Walkthrough

Split migration SQL into individual statements, skip rewriting COMMENT ... statements, and rejoin statements; tests updated to cover comment-preservation. CI: deploy workflow now resets most tenants to pending and clears schemas/errors; e2e workflow stages per-service migrations and enables schema provisioning. Test baseline incremented.

Changes

Cohort / File(s)	Summary
Migration SQL processor & tests `services/tenant/provisioner/migration_runner.go`, `services/tenant/provisioner/migration_runner_test.go`	Process migration SQL per-statement, add `isCommentStatement`/`stripLeadingNoise` to detect and skip `COMMENT ...` statements, rewrite other statements, reassemble with trailing semicolon. Tests renamed/added to assert comments are not rewritten; removed prior known-broken exclusions.
Deploy workflow tenant reset `.github/workflows/deploy-develop.yml`	Post-migration SSH step now sets `state='pending'` for all tenants except `deprovisioned`, clears `service_schemas` to `[]::jsonb`, and resets `error_message` to `''`.
E2E workflow & migration staging `.github/workflows/e2e.yml`	Add pre-start step copying `services/<service>/migrations` into `/tmp/provisioner-migrations/<service>`; start command sets `SCHEMA_PROVISIONING_ENABLED=true` and `MIGRATIONS_BASE_PATH=/tmp/provisioner-migrations`; removed explicit CockroachDB tenant schema provisioning step.
Test baseline adjustment `tests/architecture/size_test.go`	Incremented `baselineOversizedFunctions` from 187 to 189.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~30 minutes

Possibly related PRs

fix: corrective migration and CI test for COMMENT ON COLUMN rewrite bug #2120: Addresses the same COMMENT ON COLUMN rewrite issue in services/tenant/provisioner/migration_runner.go and adds CI tests/migrations guarding that behavior.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main fix: skipping COMMENT statements in tenant migration rewriting, which directly addresses the core bug causing tenant provisioning failures.
Description check	✅ Passed	The description comprehensively relates to the changeset, explaining the bug, root cause, specific file changes, evidence, and test plan for all modifications in the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-develop-20260404-comment-on-column-rewriter

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-04-04T20:20:04Z

Claude Code Review

Commit: ae006ff | CI: running

Summary

Clean, well-scoped bug fix for a migration rewriter that corrupted COMMENT ON COLUMN statements by treating table.column as schema.table. The root cause is clearly identified, the fix is surgical (skip COMMENT statements from schema-pattern rewriting), and the test coverage is thorough - including re-enabling the previously skipped knownBroken migrations.

The E2E workflow change from manual schema creation to using the actual provisioner (SCHEMA_PROVISIONING_ENABLED=true) is a meaningful improvement that makes E2E tests exercise the production code path.

Risk Assessment

Area	Level	Detail
Blast radius	Med	Tenant provisioning - all tenants affected
Rollback	Safe	Go code change, no migration schema changes
Scale	Low	Runs once per migration per tenant, not hot path
Cross-system	Low	Self-contained in tenant provisioner
Migration	N/A	No new migration files

Findings

Severity	Location	Description	Status
Note	`migration_runner.go:390-408`	`isCommentStatement` blanket-skips all COMMENT variants (ON TABLE, ON COLUMN, etc.), but only COMMENT ON COLUMN has the table.column ambiguity. Verified: all COMMENT ON TABLE in the codebase use unquoted single identifiers, so blanket skip is safe. search_path provides fallback.	Acknowledged
Note	`migration_runner.go:366-399`	Double-split: processMigrationSQL splits, processes, rejoins. applyMigrationList splits again. Round-trip verified idempotent.	Acknowledged
Note	`deploy-develop.yml:289`	Broadened reset is more aggressive but well-justified for recovery from broken state.	Acknowledged

Test Coverage

TestProcessMigrationSQL_CommentOnColumn_NotRewritten directly validates the fix
TestProcessMigrationSQL_CommentOnColumn_MixedWithDDL pins mixed-statement behavior
TestIsCommentStatement_SkipsLeadingComments covers comment detection edge cases
TestProcessMigrationSQL_AllMigrations_Parse exercises all real migrations with knownBroken emptied

Bot Review Notes

No unresolved bot threads. Previous inline comments resolved.

claude

See summary. 1 inline suggestion.

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@services/tenant/provisioner/migration_runner.go`:
- Around line 413-421: The isCommentStatement function fails to detect COMMENT
statements preceded by block comments; update isCommentStatement to skip leading
block comments (/* ... */) as well as line comments and blank lines before
checking the first non-comment token, e.g., in isCommentStatement iterate
through stmt lines (or scan tokens) to strip/skip both lines starting with "--"
and block comment spans that may start and end on the same line or across lines,
then perform strings.HasPrefix(strings.ToUpper(trimmed), "COMMENT ") on the
first non-comment chunk so statements like "/* preface */ COMMENT ON ..." are
correctly recognized.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 6041708f-8214-43ee-af56-237156ed25df

📥 Commits

Reviewing files that changed from the base of the PR and between c478052 and 13dec02.

📒 Files selected for processing (4)

.github/workflows/deploy-develop.yml
services/tenant/provisioner/migration_runner.go
services/tenant/provisioner/migration_runner_test.go
tests/architecture/size_test.go

codecov · 2026-04-04T20:39:51Z

Codecov Report

❌ Patch coverage is 90.24390% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
services/tenant/provisioner/migration_runner.go	90.24%	2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

The E2E workflow created empty org_<tenant> schemas manually via CREATE SCHEMA IF NOT EXISTS and relied on the public-schema search_path fallback to resolve unqualified table references. PR #2125 removed public from search_path, so those empty tenant schemas can no longer satisfy any query and the manifest diff step fails with "relation data_source / dataset_definition / instrument_definition does not exist". Replace the manual schema workaround with real provisioner invocation: - Stage per-service migration files under /tmp/provisioner-migrations so MIGRATIONS_BASE_PATH points at a flat layout matching the Docker image structure (/migrations/<service>/*.sql). - Start meridian with SCHEMA_PROVISIONING_ENABLED=true and MIGRATIONS_BASE_PATH=/tmp/provisioner-migrations so seed-dev's InitiateTenant triggers real provisioning that creates schemas AND applies migrations to them.

claude

Correct hotfix for a real production blocker. The statement-level split + COMMENT skip approach is clean and well-tested. See summary comment for full analysis. No blocking concerns.

Address CodeRabbit review feedback: the initial isCommentStatement implementation only skipped leading whitespace and line comments. A statement of the form "/* audit */ COMMENT ON COLUMN ..." was not recognized as a COMMENT and could still be corrupted by the schema rewriter. Extract stripLeadingNoise helper that iteratively strips whitespace, -- line comments, and /* ... */ block comments, then check the remaining text for the COMMENT keyword. Unterminated block comments return false so no rewriting is attempted on malformed SQL. Add TestIsCommentStatement_SkipsLeadingComments covering plain, leading whitespace, line comment, block comment, mixed leading comments, non-COMMENT DDL, mid-statement block comment, and unterminated block comment cases.

claude

No blocking issues found. See summary comment for full analysis. Well-executed fix with strong test coverage.

coderabbitai

🧹 Nitpick comments (1)

services/tenant/provisioner/migration_runner.go (1)

385-407: Consider removing the split→join→split round-trip.

processMigrationSQL now parses statements and serializes them back, then applyMigrationList parses again at Line 278. Returning rewritten statements directly would reduce parser drift risk and avoid duplicate work.

♻️ Refactor sketch

-processedSQL := p.processMigrationSQL(mig.Content, schemaName)
-statements := splitSQLStatements(processedSQL)
+statements := p.processMigrationStatements(mig.Content, schemaName)

-func (p *PostgresProvisioner) processMigrationSQL(sql, schemaName string) string {
+func (p *PostgresProvisioner) processMigrationStatements(sql, schemaName string) []string {
   ...
-  if len(rewritten) == 0 {
-    return ""
-  }
-  return strings.Join(rewritten, ";\n") + ";"
+  return rewritten
}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/tenant/provisioner/migration_runner.go` around lines 385 - 407, The
current processMigrationSQL does splitSQLStatements -> rewrite -> strings.Join
and returns a single SQL string which forces applyMigrationList to call
splitSQLStatements again; instead change processMigrationSQL to return the
rewritten statement slice directly (e.g. []string) to avoid the split→join→split
round-trip: keep the existing statement parsing loop and
isCommentStatement/schemaPatterns rewriting logic inside processMigrationSQL,
change its signature to return []string (or add a new helper like
rewriteMigrationStatements that returns []string), update callers such as
applyMigrationList to accept and use the returned []string directly (removing
the extra splitSQLStatements call and the trailing semicolon join), and preserve
existing behavior for comment statements and quoting rules by reusing
schemaPatterns and the same replacement logic.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@services/tenant/provisioner/migration_runner.go`:
- Around line 385-407: The current processMigrationSQL does splitSQLStatements
-> rewrite -> strings.Join and returns a single SQL string which forces
applyMigrationList to call splitSQLStatements again; instead change
processMigrationSQL to return the rewritten statement slice directly (e.g.
[]string) to avoid the split→join→split round-trip: keep the existing statement
parsing loop and isCommentStatement/schemaPatterns rewriting logic inside
processMigrationSQL, change its signature to return []string (or add a new
helper like rewriteMigrationStatements that returns []string), update callers
such as applyMigrationList to accept and use the returned []string directly
(removing the extra splitSQLStatements call and the trailing semicolon join),
and preserve existing behavior for comment statements and quoting rules by
reusing schemaPatterns and the same replacement logic.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: ca9d2538-87b1-42bc-a264-a1efc1a365a6

📥 Commits

Reviewing files that changed from the base of the PR and between 8887437 and ae006ff.

📒 Files selected for processing (2)

services/tenant/provisioner/migration_runner.go
services/tenant/provisioner/migration_runner_test.go

PR #2128 added a tenant_provisioning reset to force re-provisioning on every deploy, but the reset targeted the wrong table. The provisioning worker polls for pending tenants via ListByStatus(StatusProvisioningPending) which reads tenant.status, not tenant_provisioning.state. Resetting only tenant_provisioning left tenant.status='active', so the worker never claimed the tenant, provisioning never re-ran, and seed-dev's manifest apply still failed against the empty tenant schemas. Also: seed-dev's waitForTenantReady calls GetTenantProvisioningStatus whose OverallStatus is derived from tenant.status. With tenant.status stuck at 'active', seed-dev returned immediately from its wait loop and fired the manifest apply before the worker had a chance to act. Reset both tables in the same psql call: - tenant.status = 'provisioning_pending' (triggers worker pickup and makes seed-dev wait for real completion). - tenant_provisioning.service_schemas = '[]' (forces provisionAllServices to re-apply migrations instead of short-circuiting on "already provisioned" entries).

…2131) PR #2128 added a tenant_provisioning reset to force re-provisioning on every deploy, but the reset targeted the wrong table. The provisioning worker polls for pending tenants via ListByStatus(StatusProvisioningPending) which reads tenant.status, not tenant_provisioning.state. Resetting only tenant_provisioning left tenant.status='active', so the worker never claimed the tenant, provisioning never re-ran, and seed-dev's manifest apply still failed against the empty tenant schemas. Also: seed-dev's waitForTenantReady calls GetTenantProvisioningStatus whose OverallStatus is derived from tenant.status. With tenant.status stuck at 'active', seed-dev returned immediately from its wait loop and fired the manifest apply before the worker had a chance to act. Reset both tables in the same psql call: - tenant.status = 'provisioning_pending' (triggers worker pickup and makes seed-dev wait for real completion). - tenant_provisioning.service_schemas = '[]' (forces provisionAllServices to re-apply migrations instead of short-circuiting on "already provisioned" entries). Co-authored-by: Ben Coombs <bjcoombs@users.noreply.github.com>

claude Bot reviewed Apr 4, 2026

View reviewed changes

Comment thread services/tenant/provisioner/migration_runner.go

coderabbitai Bot requested changes Apr 4, 2026

View reviewed changes

Comment thread services/tenant/provisioner/migration_runner.go

claude Bot approved these changes Apr 4, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 4, 2026

View reviewed changes

coderabbitai Bot approved these changes Apr 4, 2026

View reviewed changes

bjcoombs merged commit 7964e77 into develop Apr 5, 2026
40 checks passed

bjcoombs deleted the fix-develop-20260404-comment-on-column-rewriter branch April 5, 2026 08:01

bjcoombs mentioned this pull request Apr 5, 2026

fix: Reset tenant.status in deploy-develop so worker re-provisions #2131

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Skip COMMENT statements in tenant migration rewriter#2128

fix: Skip COMMENT statements in tenant migration rewriter#2128
bjcoombs merged 3 commits intodevelopfrom
fix-develop-20260404-comment-on-column-rewriter

bjcoombs commented Apr 4, 2026

Uh oh!

coderabbitai Bot commented Apr 4, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

claude Bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

codecov Bot commented Apr 4, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bjcoombs commented Apr 4, 2026

Summary

Root cause

Changes

Evidence

Test plan

Uh oh!

coderabbitai Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Possibly related PRs

Uh oh!

claude Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Summary

Risk Assessment

Findings

Test Coverage

Bot Review Notes

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

codecov Bot commented Apr 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 4, 2026 •

edited

Loading

claude Bot commented Apr 4, 2026 •

edited

Loading

codecov Bot commented Apr 4, 2026 •

edited

Loading