fix: Apply CockroachDB->Postgres DDL adapter in tenant provisioner by bjcoombs · Pull Request #2133 · meridianhub/meridian

bjcoombs · 2026-04-05T10:04:10Z

Summary

Export AdaptCockroachDDLForPostgres from internal/migrations and call it from the tenant provisioner's migration runner when DB_DRIVER=postgres. The adapter was previously only applied by the CLI --migrate path that runs against public schemas; tenant schema provisioning bypassed it and hit "cannot drop index because constraint requires it" on Postgres.
Update deploy-develop.yml to DROP SCHEMA IF EXISTS org_* CASCADE across all service databases before forcing re-provisioning. The previous reset cleared metadata but left ghost tables from broken runs, and migrations that rename tables (e.g. internal-account/20260225) could not reconcile the resulting mixed state.

Why two fixes in one PR

Both problems surfaced on the same deploy cycle after PR #2131 merged. With only the DDL adapter fix, the CRDB-specific DROP INDEX CASCADE statements succeed, but tenant schemas still contain ghost state. With only the DROP SCHEMA fix, schemas are clean but CRDB syntax still fails on Postgres. Both are required for the deploy to reach a green state.

Evidence

Post-#2131 merge deploy on c49d82b failed with:
```
meridian_master: control-plane migrations failed: execute migration 20260401000002_migrate_version_to_varchar.sql:
ERROR: cannot drop index uq_manifest_version_version because constraint uq_manifest_version_version on table manifest_version requires it (SQLSTATE 2BP01)

volterra_energy: internal-account migrations failed: execute migration 20260116000001_add_clearing_purpose_column.sql:
ERROR: relation "internal_bank_account" does not exist (SQLSTATE 42P01)
```
The first is the CRDB->Postgres adapter gap. The second is ghost state from a previous broken run that left the schema with the renamed `internal_account` table but no `internal_bank_account`, breaking later re-runs that expect to ADD a column to it.

Changes

internal/migrations/runner.go + adapt_test.go

Rename `adaptCockroachDDLForPostgres` to `AdaptCockroachDDLForPostgres` (exported).
Update internal callsite and test file references.

services/tenant/provisioner/migration_runner.go

Import `internal/migrations` (aliased as `internalmigrations` because the local `migration` type shadows the package name).
Call `AdaptCockroachDDLForPostgres` in `applyMigrationInTransaction` when `DriverFromEnv() == DriverPostgres`.
Extract `applyMigrationInTransaction` helper to keep `applyMigrationList` under the gocognit complexity limit.

.github/workflows/deploy-develop.yml

Before the tenant_provisioning reset, iterate each service DB and `DROP SCHEMA IF EXISTS org_* CASCADE`. Uses `pg_namespace` with an ESCAPE clause to match `org_%` literally.

Test plan

`go build ./...` clean
`go test ./internal/migrations/... ./services/tenant/provisioner/...` passes locally (335s)
pre-commit (gitleaks, gofumpt, golangci-lint, gocognit) all green
CI green on this branch
After merge: develop deploy reaches "Tenant provisioned and active" and seed-dev manifest apply succeeds
Verify on droplet: `tenant.status` reaches `active` for both `volterra_energy` and `meridian_master`, and tenant schemas contain the full table set

Two related fixes so the develop deploy can fully recover after a broken provisioning run: 1. Export internal/migrations.AdaptCockroachDDLForPostgres and call it from services/tenant/provisioner/migration_runner.go when DB_DRIVER=postgres. Previously the adapter was only applied by the CLI --migrate path that runs against public schemas; the tenant provisioner applied CockroachDB-native DDL directly to tenant schemas and hit "cannot drop index because constraint requires it" on Postgres for migrations like control-plane/ 20260401000002_migrate_version_to_varchar.sql and reference-data/ 20260127000001_fix_platform_saga_unique_constraint.sql. Tenant provisioning now gets the same Postgres compatibility treatment. 2. Update .github/workflows/deploy-develop.yml to DROP SCHEMA IF EXISTS ... CASCADE for every org_* schema across all service databases before forcing re-provisioning. The previous reset cleared tenant.status and service_schemas metadata but left physical tables from the broken run in place. Migrations that rename tables (e.g. internal-account/20260225 rename_to_internal_ account) leave the tenant schema in a state that later reruns cannot reconcile because CREATE TABLE succeeds creating a fresh empty table next to the existing renamed one. Dropping and recreating from scratch matches the E2E workflow path which is known to pass. Internal aliasing: the provisioner package already has a local type named "migration", so the internal/migrations import is aliased to internalmigrations to avoid shadowing. Refactor: extracted applyMigrationInTransaction helper out of applyMigrationList to keep cognitive complexity under the architecture baseline (gocognit limit).

coderabbitai · 2026-04-05T10:04:27Z

📝 Walkthrough

Walkthrough

Deployment workflow now stops the app container, drops physical org_<tenant> schemas across service databases, clears provisioning state rows, then restarts the container. A CockroachDB→Postgres DDL adapter was exported and tenant provisioner refactored to run per-migration transactions with optional per-statement DDL adaptation.

Changes

Cohort / File(s)	Summary
Deployment Workflow `.github/workflows/deploy-develop.yml`	Stops the `meridian-develop` container, loops over `SERVICE_DBS` to drop `org_<tenant>` schemas (`DROP SCHEMA IF EXISTS ... CASCADE`) via `postgres-develop`, clears `service_schemas` and resets provisioning rows, then starts the container so the provisioning worker observes the reset state.
DDL Adapter Export & Tests `internal/migrations/runner.go`, `internal/migrations/adapt_test.go`	Renamed and exported `adaptCockroachDDLForPostgres` → `AdaptCockroachDDLForPostgres`; updated call sites and tests to use the exported function.
Tenant Provisioner Refactor & Tests `services/tenant/provisioner/migration_runner.go`, `services/tenant/provisioner/migration_runner_test.go`	Introduced `applyMigrationInTransaction` to execute each migration in its own transaction, added per-statement Cockroach→Postgres adaptation when running under Postgres driver, and added tests verifying that per-statement adaptation preserves DO-block wrapping semantics.

Sequence Diagram(s)

sequenceDiagram
    participant GH as GitHub Actions
    participant Host as Docker Host
    participant Container as meridian-develop (app)
    participant Postgres as postgres-develop
    participant DBs as Service DBs
    participant Provisioner as Provisioning Worker

    GH->>Host: stop Container
    Host->>Container: docker compose stop
    GH->>Postgres: for each SERVICE_DB -> query pg_namespace for org_<tenant>
    Postgres->>DBs: execute DROP SCHEMA IF EXISTS org_<tenant> CASCADE
    Postgres->>Postgres: UPDATE tenant.status / tenant_provisioning (clear service_schemas, set pending)
    Host->>Container: docker compose start
    Container->>Provisioner: boot / worker starts
    Provisioner->>Postgres: read tenant.status and tenant_provisioning
    Provisioner->>Provisioner: enqueue/process provisioning based on reset state

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 45.45% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title directly and clearly describes the main change: exporting and applying the CockroachDB-to-Postgres DDL adapter in the tenant provisioner, which is the core fix for the Postgres migration errors.
Description check	✅ Passed	The description is comprehensive and directly related to the changeset, explaining the rationale, evidence of the problem, specific changes made across multiple files, and test results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-migration-drop-constraint-postgres

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude · 2026-04-05T10:06:48Z

Claude Code Review

Commit: 08dd70d | CI: running (most checks still pending)

Solid bug fix that closes two related gaps in the develop deploy pipeline. The first commit correctly exports AdaptCockroachDDLForPostgres and wires it into the tenant provisioner. The second commit (responding to CodeRabbit) fixes a real correctness issue: applying the adapter per-statement after splitting rather than on the full SQL blob, and eliminates a race condition in the deploy workflow by stopping the app before resetting state.

The code is well-structured, the test pinning the split-then-adapt ordering is valuable, and the deploy workflow comments clearly explain the sequencing rationale.

Risk Assessment: All Low. Affects develop deploy only (production uses CockroachDB natively). Safe to revert. No schema migrations. Not hot path. Self-contained to provisioner and deploy workflow.

Findings: Note - migration_runner.go:310 - DriverFromEnv() gate relies on DB_DRIVER env var; if unset, adapter silently skips. Not a real risk since the entire system depends on this same env var. Consistent with CLI path in runner.go:405. Informational only.

Bot Review Notes:

CodeRabbit thread on migration_runner.go:320 (unresolved): Split ordering concern (Critical) - Fully addressed in commit 08dd70d. Code now splits first via splitSQLStatements, then adapts each statement individually. New test TestPostgresAdapter_PerStatementAdaptation_PreservesDOBlocks pins this ordering. DriverFromEnv() defaulting concern - Still present but disagree this is a bug. PostgresProvisioner only runs in environments where DB_DRIVER=postgres is configured. If env var is missing, the entire system would be wrong, not just this adapter. Pattern is identical to CLI path in runner.go:405.

CodeRabbit thread on deploy-develop.yml:314 (resolved by CodeRabbit itself): Stop-before-reset concern addressed in commit 08dd70d. Workflow now does stop, reset state + drop schemas, then start.

claude

Clean fix. Well-scoped, correctly coupled, minimal blast radius. See summary comment for details. One minor suggestion on migration_runner.go:299 — DriverFromEnv() is called per-migration inside applyMigrationInTransaction; reading it once in applyMigrationList and passing it down would match the pattern in internal/migrations/runner.go.

coderabbitai

Actionable comments posted: 2

🧹 Nitpick comments (1)

.github/workflows/deploy-develop.yml (1)
304-305: Avoid a second authoritative service list here.

SERVICE_DBS matches defaultServiceDefs in services/tenant/provisioner/provisioner.go:390-411 today, but it now has to be maintained in two places. The next service addition can update the provisioner without updating this cleanup step, leaving stale org_* schemas behind in the new database. A small shared script or repo check would make this much harder to drift.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-develop.yml around lines 304 - 305, SERVICE_DBS
duplicates the canonical service list in defaultServiceDefs (provisioner.go);
replace the hardcoded SERVICE_DBS with a single source of truth. Create a small
script or command (e.g., scripts/list_services.sh or a Go helper that prints the
space-separated service names) that is used by provisioner.go (or generated from
defaultServiceDefs) and invoked by the deploy workflow; then update the workflow
to call that script/command instead of defining SERVICE_DBS inline so the
cleanup loop uses the same list as defaultServiceDefs and cannot drift.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.github/workflows/deploy-develop.yml:
- Around line 283-304: Ensure the deploy workflow stops the running application
before resetting tenant provisioning state and dropping DB schemas: add a step
to stop/scale down the meridian-develop service (or otherwise halt the app)
prior to the block that sets SERVICE_DBS and flips tenants to
provisioning_pending and clears tenant_provisioning.service_schemas; then only
after tenants are marked pending and schemas dropped, restart or redeploy
meridian-develop. Apply the same sequencing change to the corresponding block
referenced around lines 314-321 so the worker cannot race with the reset.

In `@services/tenant/provisioner/migration_runner.go`:
- Around line 300-313: The migration SQL is being fully adapted via
internalmigrations.AdaptCockroachDDLForPostgres before splitSQLStatements, which
breaks dollar-quoted DO blocks (splitSQLStatements doesn't handle dollar
quotes); move the adaptation to after splitting and apply
AdaptCockroachDDLForPostgres to each individual statement (or alternatively
improve splitSQLStatements to recognize dollar-quoted bodies), and ensure the
adaptation decision in PostgresProvisioner is based on the provisioner's actual
target driver (not internalmigrations.DriverFromEnv() which defaults to
CockroachDB) so the adaptation always runs when provisioning Postgres; update
the loop that iterates statements (the tx.Exec call) to exec the adapted
per-statement SQL and add a regression test that runs a public-schema CHECK
migration containing a DO $...$ block through this code path to verify no broken
fragments are executed.

---

Nitpick comments:
In @.github/workflows/deploy-develop.yml:
- Around line 304-305: SERVICE_DBS duplicates the canonical service list in
defaultServiceDefs (provisioner.go); replace the hardcoded SERVICE_DBS with a
single source of truth. Create a small script or command (e.g.,
scripts/list_services.sh or a Go helper that prints the space-separated service
names) that is used by provisioner.go (or generated from defaultServiceDefs) and
invoked by the deploy workflow; then update the workflow to call that
script/command instead of defining SERVICE_DBS inline so the cleanup loop uses
the same list as defaultServiceDefs and cannot drift.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 9f0d523e-4835-416b-afb9-9ccc39457b0b

📥 Commits

Reviewing files that changed from the base of the PR and between 827b5c4 and 6da5cad.

📒 Files selected for processing (4)

.github/workflows/deploy-develop.yml
internal/migrations/adapt_test.go
internal/migrations/runner.go
services/tenant/provisioner/migration_runner.go

codecov · 2026-04-05T10:22:40Z

Codecov Report

❌ Patch coverage is 71.42857% with 4 lines in your changes missing coverage. Please review.

Files with missing lines	Patch %	Lines
services/tenant/provisioner/migration_runner.go	71.42%	2 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

…y sequencing Two critical issues flagged on the review: 1. Critical: AdaptCockroachDDLForPostgres was called on the full processed SQL before splitSQLStatements. The adapter wraps public-schema CHECK constraints in DO $compat$ BEGIN ...; EXCEPTION WHEN duplicate_object THEN NULL; END $compat$; blocks whose internal semicolons are then split into three invalid fragments by splitSQLStatements (which does not understand dollar-quoted bodies). Fix: split first, then apply the adapter to each statement individually, re-adding the trailing `;` the splitter stripped so the adapter's `;`-anchored regex still matches. Add TestPostgresAdapter_PerStatementAdaptation_PreservesDOBlocks which pins this ordering - if the pre-split flow is ever restored, the test proves the DO block survives splitting and fails the build. 2. Major: the deploy workflow reset tenant.status while meridian- develop was still running. The provisioning worker polls that column every 10s, so it would race the script: start provisioning against half-cleaned databases and get interrupted by the subsequent restart. Fix: stop meridian-develop before resetting state and dropping schemas, then start it back up afterwards. "restart" is replaced with "stop + work + start" so the worker observes the fully-reset state on boot.

coderabbitai

🧹 Nitpick comments (1)

services/tenant/provisioner/migration_runner_test.go (1)

681-690: Tighten assertion to enforce exactly one DO-wrapped statement.

The test comment says one adapted statement should be wrapped, but current checks only require at least one. Counting wrappers will make this guard stricter.

♻️ Suggested assertion hardening

 	// Find the DO-wrapped statement.
 	var doStmt string
+	wrappedCount := 0
 	for _, a := range adapted {
 		if strings.Contains(a, "DO $compat$") {
+			wrappedCount++
 			doStmt = a
-			break
 		}
 	}
+	require.Equal(t, 1, wrappedCount, "expected exactly one DO-wrapped statement")
 	require.NotEmpty(t, doStmt, "expected one statement to be wrapped in DO $compat$ block")

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@services/tenant/provisioner/migration_runner_test.go` around lines 681 - 690,
The test currently only asserts that at least one adapted statement contains "DO
$compat$"; tighten it to assert there is exactly one such wrapper by iterating
over adapted (same slice variable) to count occurrences of "DO $compat$" (or
collect matches) and replace require.NotEmpty(t, doStmt, ...) with
require.Equal(t, 1, count, "expected exactly one statement to be wrapped in DO
$compat$ block"); update/remove the doStmt variable accordingly so the test
fails if zero or multiple DO-wrapped statements are present.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@services/tenant/provisioner/migration_runner_test.go`:
- Around line 681-690: The test currently only asserts that at least one adapted
statement contains "DO $compat$"; tighten it to assert there is exactly one such
wrapper by iterating over adapted (same slice variable) to count occurrences of
"DO $compat$" (or collect matches) and replace require.NotEmpty(t, doStmt, ...)
with require.Equal(t, 1, count, "expected exactly one statement to be wrapped in
DO $compat$ block"); update/remove the doStmt variable accordingly so the test
fails if zero or multiple DO-wrapped statements are present.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 55d42d11-c21e-423b-bf7b-55ec54a27ca7

📥 Commits

Reviewing files that changed from the base of the PR and between 6da5cad and 08dd70d.

📒 Files selected for processing (3)

.github/workflows/deploy-develop.yml
services/tenant/provisioner/migration_runner.go
services/tenant/provisioner/migration_runner_test.go

🚧 Files skipped from review as they are similar to previous changes (2)

.github/workflows/deploy-develop.yml
services/tenant/provisioner/migration_runner.go

claude

See summary comment for full review. Both CodeRabbit concerns addressed in the second commit. No blocking issues found. Clean, well-tested fix.

Addressed in follow-up commit. CodeRabbit re-reviewed and approved at 11:02.

The demo deploy had the same broken reset as develop had before PRs #2131 and #2133: - Only reset tenant_provisioning.state, not tenant.status (worker polls tenant.status via ListByStatus(StatusProvisioningPending)) - Didn't clear service_schemas JSONB (provisioner short-circuits on "service already provisioned, skipping") - Didn't drop ghost schemas from previous broken runs - Didn't stop the app before resetting (race between worker and schema drops) Port the same fix from deploy-develop.yml: - Stop meridian before resetting state - DROP SCHEMA CASCADE for org_* schemas across all service databases - Reset both tenant.status and tenant_provisioning.service_schemas - Start meridian after reset

) The demo deploy had the same broken reset as develop had before PRs #2131 and #2133: - Only reset tenant_provisioning.state, not tenant.status (worker polls tenant.status via ListByStatus(StatusProvisioningPending)) - Didn't clear service_schemas JSONB (provisioner short-circuits on "service already provisioned, skipping") - Didn't drop ghost schemas from previous broken runs - Didn't stop the app before resetting (race between worker and schema drops) Port the same fix from deploy-develop.yml: - Stop meridian before resetting state - DROP SCHEMA CASCADE for org_* schemas across all service databases - Reset both tenant.status and tenant_provisioning.service_schemas - Start meridian after reset Co-authored-by: Ben Coombs <bjcoombs@users.noreply.github.com>

claude Bot approved these changes Apr 5, 2026

View reviewed changes

coderabbitai Bot previously requested changes Apr 5, 2026

View reviewed changes

Comment thread .github/workflows/deploy-develop.yml

Comment thread services/tenant/provisioner/migration_runner.go

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

claude Bot approved these changes Apr 5, 2026

View reviewed changes

coderabbitai Bot approved these changes Apr 5, 2026

View reviewed changes

bjcoombs merged commit f818796 into develop Apr 5, 2026
40 checks passed

bjcoombs deleted the fix-migration-drop-constraint-postgres branch April 5, 2026 12:56

bjcoombs mentioned this pull request Apr 6, 2026

fix: Port tenant provisioning reset fixes to deploy-demo workflow #2134

Merged

3 tasks

bjcoombs mentioned this pull request Apr 6, 2026

refactor: Move demo user seeding to post-provisioning hook #2137

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Apply CockroachDB->Postgres DDL adapter in tenant provisioner#2133

fix: Apply CockroachDB->Postgres DDL adapter in tenant provisioner#2133
bjcoombs merged 2 commits intodevelopfrom
fix-migration-drop-constraint-postgres

bjcoombs commented Apr 5, 2026

Uh oh!

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

claude Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

claude Bot left a comment

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 5, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

claude Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bjcoombs commented Apr 5, 2026

Summary

Why two fixes in one PR

Evidence

Changes

Test plan

Uh oh!

coderabbitai Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

❌ Failed checks (1 warning)

Uh oh!

claude Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Claude Code Review

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

codecov Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading

claude Bot commented Apr 5, 2026 •

edited

Loading

codecov Bot commented Apr 5, 2026 •

edited

Loading