fix: Reset tenant.status in deploy-develop so worker re-provisions by bjcoombs · Pull Request #2131 · meridianhub/meridian

bjcoombs · 2026-04-05T08:21:56Z

Summary

PR fix: Skip COMMENT statements in tenant migration rewriter #2128 reset the wrong table. The workflow updated tenant_provisioning but the provisioning worker polls tenant.status via ListByStatus(StatusProvisioningPending). Worker never picked up existing tenants, provisioning never re-ran, manifest apply still failed.
Also, seed-dev's waitForTenantReady calls GetTenantProvisioningStatus whose OverallStatus is derived from tenant.status. With tenant.status='active' stale, seed-dev returned immediately and fired the manifest apply before the worker could act.
Reset both tables in the same psql call.

Evidence

On develop right now (post-PR #2128 merge + deploy):

tenant_id        | state
volterra_energy  | pending      <- tenant_provisioning
meridian_master  | pending      <- tenant_provisioning

id               | status
volterra_energy  | active       <- tenant (still)
meridian_master  | active       <- tenant (still)

Deploy Develop CI log, 2026-04-05 08:06:

UPDATE 2                                                                  <- tenant_provisioning reset worked
Creating tenant 'volterra_energy' ...
Dev tenant already exists (idempotent).
Waiting for tenant provisioning ...
Tenant provisioned and active.                                            <- lie, tenant.status was just stale
Step [diff]: STEP_RESULT_STATUS_FAILED - relation "data_source" does not exist

Code references

services/tenant/worker/provisioning_worker.go:292 - w.repo.ListByStatus(ctx, domain.StatusProvisioningPending, ...) - the worker only picks up tenants whose tenant.status = 'provisioning_pending'.
services/tenant/service/grpc_provisioning_endpoints.go:151 - OverallStatus: s.toProtoStatus(tenant.Status) - seed-dev's wait loop reads this.
services/tenant/provisioner/postgres_provisioner.go:262 - the "service already provisioned, skipping" short-circuit still needs service_schemas = '[]' to bypass (kept from PR fix: Skip COMMENT statements in tenant migration rewriter #2128).

Test plan

Merge and trigger develop deploy
Watch droplet: tenant.status should flip to provisioning_pending → provisioning → active before seed-dev runs
Deploy Develop CI goes green
Manually verify org_volterra_energy schemas contain actual tables afterwards

PR #2128 added a tenant_provisioning reset to force re-provisioning on every deploy, but the reset targeted the wrong table. The provisioning worker polls for pending tenants via ListByStatus(StatusProvisioningPending) which reads tenant.status, not tenant_provisioning.state. Resetting only tenant_provisioning left tenant.status='active', so the worker never claimed the tenant, provisioning never re-ran, and seed-dev's manifest apply still failed against the empty tenant schemas. Also: seed-dev's waitForTenantReady calls GetTenantProvisioningStatus whose OverallStatus is derived from tenant.status. With tenant.status stuck at 'active', seed-dev returned immediately from its wait loop and fired the manifest apply before the worker had a chance to act. Reset both tables in the same psql call: - tenant.status = 'provisioning_pending' (triggers worker pickup and makes seed-dev wait for real completion). - tenant_provisioning.service_schemas = '[]' (forces provisionAllServices to re-apply migrations instead of short-circuiting on "already provisioned" entries).

coderabbitai · 2026-04-05T08:22:12Z

📝 Walkthrough

Walkthrough

The workflow's post-migration tenant reset logic now updates two database tables instead of one. A new UPDATE statement on the tenant table is added before the existing tenant_provisioning update, expanding the SQL command from single to multi-statement execution.

Changes

Cohort / File(s)	Summary
Post-migration Database Reset `.github/workflows/deploy-develop.yml`	Updated tenant reset SQL logic to perform sequential updates on both `tenant` and `tenant_provisioning` tables. The `tenant` table is updated to set `status` to `'provisioning_pending'` and update `updated_at` timestamp, followed by the existing `tenant_provisioning` table reset that sets `state` to `'pending'` and clears `service_schemas`.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main change: resetting tenant.status in deploy-develop to allow worker re-provisioning, which is the primary objective of this PR.
Description check	✅ Passed	The description is directly related to the changeset, providing detailed context about the issue, evidence, code references, and test plan for the database update changes.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch fix-develop-deploy-tenant-status-reset

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

claude

Clean, well-diagnosed fix. The root cause analysis is thorough: PR #2128 only reset tenant_provisioning.state but the provisioning worker polls tenant.status via ListByStatus(StatusProvisioningPending), so tenants with stale status='active' were never picked up for re-provisioning. The seed-dev wait loop also reads tenant.Status via GetTenantProvisioningStatus, so it returned immediately before the worker could act.

Verified:

'provisioning_pending' matches domain.StatusProvisioningPending exactly (tenant.go:18)
'deprovisioned' exclusion is correct — matches the terminal state constant (tenant.go:28)
The worker's processPendingTenants at provisioning_worker.go:292 confirms it queries by domain.StatusProvisioningPending
grpc_provisioning_endpoints.go:151 confirms OverallStatus is derived from tenant.Status, validating the seed-dev race diagnosis
Both UPDATEs run in a single psql -c call — they execute in the same implicit transaction, so the reset is atomic
updated_at = NOW() on the tenant table is a good addition for auditability
Comments are excellent — they explain why both tables must be reset in tandem and reference the specific source files

No concerns. This is a targeted CI workflow fix with zero application code changes and clear blast radius (develop deploy only).

claude · 2026-04-05T08:24:00Z

Claude Code Review

Commit: a10ff90 | CI: running (security checks passing, CodeRabbit pending)

Summary

Correct fix for the deploy-develop provisioning failure. PR #2128 only reset tenant_provisioning.state but missed tenant.status, which is the field the provisioning worker actually polls. This caused a two-part failure: (1) worker never picked up tenants for re-provisioning, and (2) seed-dev's wait loop saw stale status='active' and proceeded before provisioning could run.

The fix adds a second UPDATE to reset tenant.status to 'provisioning_pending' alongside the existing tenant_provisioning reset, both in a single atomic psql call. Comments thoroughly document why both tables must be reset in tandem.

Risk Assessment

Area	Level	Detail
Blast radius	Low	develop deploy workflow only, no prod impact
Rollback	Safe	Revert restores previous (broken) behavior
Scale	N/A	CI workflow, not runtime code
Cross-system	Low	No application code changes
Migration	N/A	No migrations

Findings

No issues found. All status string values verified against domain constants.

Bot Review Notes

No unresolved bot review threads at time of review. CodeRabbit review still pending.

coderabbitai

🧹 Nitpick comments (1)

.github/workflows/deploy-develop.yml (1)
303-306: Fix correctly resets both tables to trigger re-provisioning.

The SQL properly:

Sets tenant.status = 'provisioning_pending' matching the StatusProvisioningPending constant the worker polls.

Clears service_schemas = '[]'::jsonb to bypass the "already provisioned" short-circuit.

Resets state = 'pending' and clears error_message.

Minor inconsistency: tenant updates updated_at = NOW(), but tenant_provisioning does not, despite also having an updated_at column. Consider adding it for consistency in audit/debugging.
🔧 Optional: Add updated_at to tenant_provisioning for consistency
             docker exec postgres-develop psql -U meridian -d meridian_platform -c "
               UPDATE tenant SET status = 'provisioning_pending', updated_at = NOW() WHERE status != 'deprovisioned';
-              UPDATE tenant_provisioning SET state = 'pending', service_schemas = '[]'::jsonb, error_message = '' WHERE state != 'deprovisioned';
+              UPDATE tenant_provisioning SET state = 'pending', service_schemas = '[]'::jsonb, error_message = '', updated_at = NOW() WHERE state != 'deprovisioned';
             "
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In @.github/workflows/deploy-develop.yml around lines 303 - 306, The
tenant_provisioning UPDATE should also set updated_at so both tables reflect the
same audit timestamp; update the SQL in the workflow where you run the two
UPDATEs (the docker exec psql block) to include "updated_at = NOW()" in the
tenant_provisioning SET clause (alongside state = 'pending', service_schemas =
'[]'::jsonb, error_message = '') so it matches the tenant table behavior and the
worker polling against StatusProvisioningPending.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In @.github/workflows/deploy-develop.yml:
- Around line 303-306: The tenant_provisioning UPDATE should also set updated_at
so both tables reflect the same audit timestamp; update the SQL in the workflow
where you run the two UPDATEs (the docker exec psql block) to include
"updated_at = NOW()" in the tenant_provisioning SET clause (alongside state =
'pending', service_schemas = '[]'::jsonb, error_message = '') so it matches the
tenant table behavior and the worker polling against StatusProvisioningPending.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: e6c5e60e-afb6-44cb-b33e-58537352b2cd

📥 Commits

Reviewing files that changed from the base of the PR and between 7964e77 and a10ff90.

📒 Files selected for processing (1)

.github/workflows/deploy-develop.yml

The demo deploy had the same broken reset as develop had before PRs #2131 and #2133: - Only reset tenant_provisioning.state, not tenant.status (worker polls tenant.status via ListByStatus(StatusProvisioningPending)) - Didn't clear service_schemas JSONB (provisioner short-circuits on "service already provisioned, skipping") - Didn't drop ghost schemas from previous broken runs - Didn't stop the app before resetting (race between worker and schema drops) Port the same fix from deploy-develop.yml: - Stop meridian before resetting state - DROP SCHEMA CASCADE for org_* schemas across all service databases - Reset both tenant.status and tenant_provisioning.service_schemas - Start meridian after reset

) The demo deploy had the same broken reset as develop had before PRs #2131 and #2133: - Only reset tenant_provisioning.state, not tenant.status (worker polls tenant.status via ListByStatus(StatusProvisioningPending)) - Didn't clear service_schemas JSONB (provisioner short-circuits on "service already provisioned, skipping") - Didn't drop ghost schemas from previous broken runs - Didn't stop the app before resetting (race between worker and schema drops) Port the same fix from deploy-develop.yml: - Stop meridian before resetting state - DROP SCHEMA CASCADE for org_* schemas across all service databases - Reset both tenant.status and tenant_provisioning.service_schemas - Start meridian after reset Co-authored-by: Ben Coombs <bjcoombs@users.noreply.github.com>

claude Bot approved these changes Apr 5, 2026

View reviewed changes

coderabbitai Bot reviewed Apr 5, 2026

View reviewed changes

coderabbitai Bot approved these changes Apr 5, 2026

View reviewed changes

bjcoombs merged commit c49d82b into develop Apr 5, 2026
21 checks passed

bjcoombs deleted the fix-develop-deploy-tenant-status-reset branch April 5, 2026 08:43

bjcoombs mentioned this pull request Apr 5, 2026

fix: Apply CockroachDB->Postgres DDL adapter in tenant provisioner #2133

Merged

6 tasks

bjcoombs mentioned this pull request Apr 6, 2026

fix: Port tenant provisioning reset fixes to deploy-demo workflow #2134

Merged

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Reset tenant.status in deploy-develop so worker re-provisions#2131

fix: Reset tenant.status in deploy-develop so worker re-provisions#2131
bjcoombs merged 1 commit intodevelopfrom
fix-develop-deploy-tenant-status-reset

bjcoombs commented Apr 5, 2026

Uh oh!

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading

Walkthrough

Changes

Estimated code review effort

Uh oh!

claude Bot left a comment

Uh oh!

claude Bot commented Apr 5, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

bjcoombs commented Apr 5, 2026

Summary

Evidence

Code references

Test plan

Uh oh!

coderabbitai Bot commented Apr 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Estimated code review effort

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Uh oh!

claude Bot commented Apr 5, 2026

Claude Code Review

Summary

Risk Assessment

Findings

Bot Review Notes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

coderabbitai Bot commented Apr 5, 2026 •

edited

Loading