fix: Reset tenant.status in deploy-develop so worker re-provisions#2131
fix: Reset tenant.status in deploy-develop so worker re-provisions#2131
Conversation
PR #2128 added a tenant_provisioning reset to force re-provisioning on every deploy, but the reset targeted the wrong table. The provisioning worker polls for pending tenants via ListByStatus(StatusProvisioningPending) which reads tenant.status, not tenant_provisioning.state. Resetting only tenant_provisioning left tenant.status='active', so the worker never claimed the tenant, provisioning never re-ran, and seed-dev's manifest apply still failed against the empty tenant schemas. Also: seed-dev's waitForTenantReady calls GetTenantProvisioningStatus whose OverallStatus is derived from tenant.status. With tenant.status stuck at 'active', seed-dev returned immediately from its wait loop and fired the manifest apply before the worker had a chance to act. Reset both tables in the same psql call: - tenant.status = 'provisioning_pending' (triggers worker pickup and makes seed-dev wait for real completion). - tenant_provisioning.service_schemas = '[]' (forces provisionAllServices to re-apply migrations instead of short-circuiting on "already provisioned" entries).
📝 WalkthroughWalkthroughThe workflow's post-migration tenant reset logic now updates two database tables instead of one. A new UPDATE statement on the Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Clean, well-diagnosed fix. The root cause analysis is thorough: PR #2128 only reset tenant_provisioning.state but the provisioning worker polls tenant.status via ListByStatus(StatusProvisioningPending), so tenants with stale status='active' were never picked up for re-provisioning. The seed-dev wait loop also reads tenant.Status via GetTenantProvisioningStatus, so it returned immediately before the worker could act.
Verified:
'provisioning_pending'matchesdomain.StatusProvisioningPendingexactly (tenant.go:18)'deprovisioned'exclusion is correct — matches the terminal state constant (tenant.go:28)- The worker's
processPendingTenantsatprovisioning_worker.go:292confirms it queries bydomain.StatusProvisioningPending grpc_provisioning_endpoints.go:151confirmsOverallStatusis derived fromtenant.Status, validating the seed-dev race diagnosis- Both UPDATEs run in a single
psql -ccall — they execute in the same implicit transaction, so the reset is atomic updated_at = NOW()on the tenant table is a good addition for auditability- Comments are excellent — they explain why both tables must be reset in tandem and reference the specific source files
No concerns. This is a targeted CI workflow fix with zero application code changes and clear blast radius (develop deploy only).
Claude Code ReviewCommit: SummaryCorrect fix for the deploy-develop provisioning failure. PR #2128 only reset The fix adds a second UPDATE to reset Risk Assessment
FindingsNo issues found. All status string values verified against domain constants. Bot Review NotesNo unresolved bot review threads at time of review. CodeRabbit review still pending. |
There was a problem hiding this comment.
🧹 Nitpick comments (1)
.github/workflows/deploy-develop.yml (1)
303-306: Fix correctly resets both tables to trigger re-provisioning.The SQL properly:
- Sets
tenant.status = 'provisioning_pending'matching theStatusProvisioningPendingconstant the worker polls.- Clears
service_schemas = '[]'::jsonbto bypass the "already provisioned" short-circuit.- Resets
state = 'pending'and clearserror_message.Minor inconsistency:
tenantupdatesupdated_at = NOW(), buttenant_provisioningdoes not, despite also having anupdated_atcolumn. Consider adding it for consistency in audit/debugging.🔧 Optional: Add updated_at to tenant_provisioning for consistency
docker exec postgres-develop psql -U meridian -d meridian_platform -c " UPDATE tenant SET status = 'provisioning_pending', updated_at = NOW() WHERE status != 'deprovisioned'; - UPDATE tenant_provisioning SET state = 'pending', service_schemas = '[]'::jsonb, error_message = '' WHERE state != 'deprovisioned'; + UPDATE tenant_provisioning SET state = 'pending', service_schemas = '[]'::jsonb, error_message = '', updated_at = NOW() WHERE state != 'deprovisioned'; "🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/deploy-develop.yml around lines 303 - 306, The tenant_provisioning UPDATE should also set updated_at so both tables reflect the same audit timestamp; update the SQL in the workflow where you run the two UPDATEs (the docker exec psql block) to include "updated_at = NOW()" in the tenant_provisioning SET clause (alongside state = 'pending', service_schemas = '[]'::jsonb, error_message = '') so it matches the tenant table behavior and the worker polling against StatusProvisioningPending.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In @.github/workflows/deploy-develop.yml:
- Around line 303-306: The tenant_provisioning UPDATE should also set updated_at
so both tables reflect the same audit timestamp; update the SQL in the workflow
where you run the two UPDATEs (the docker exec psql block) to include
"updated_at = NOW()" in the tenant_provisioning SET clause (alongside state =
'pending', service_schemas = '[]'::jsonb, error_message = '') so it matches the
tenant table behavior and the worker polling against StatusProvisioningPending.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: e6c5e60e-afb6-44cb-b33e-58537352b2cd
📒 Files selected for processing (1)
.github/workflows/deploy-develop.yml
The demo deploy had the same broken reset as develop had before PRs #2131 and #2133: - Only reset tenant_provisioning.state, not tenant.status (worker polls tenant.status via ListByStatus(StatusProvisioningPending)) - Didn't clear service_schemas JSONB (provisioner short-circuits on "service already provisioned, skipping") - Didn't drop ghost schemas from previous broken runs - Didn't stop the app before resetting (race between worker and schema drops) Port the same fix from deploy-develop.yml: - Stop meridian before resetting state - DROP SCHEMA CASCADE for org_* schemas across all service databases - Reset both tenant.status and tenant_provisioning.service_schemas - Start meridian after reset
) The demo deploy had the same broken reset as develop had before PRs #2131 and #2133: - Only reset tenant_provisioning.state, not tenant.status (worker polls tenant.status via ListByStatus(StatusProvisioningPending)) - Didn't clear service_schemas JSONB (provisioner short-circuits on "service already provisioned, skipping") - Didn't drop ghost schemas from previous broken runs - Didn't stop the app before resetting (race between worker and schema drops) Port the same fix from deploy-develop.yml: - Stop meridian before resetting state - DROP SCHEMA CASCADE for org_* schemas across all service databases - Reset both tenant.status and tenant_provisioning.service_schemas - Start meridian after reset Co-authored-by: Ben Coombs <bjcoombs@users.noreply.github.com>
Summary
tenant_provisioningbut the provisioning worker pollstenant.statusviaListByStatus(StatusProvisioningPending). Worker never picked up existing tenants, provisioning never re-ran, manifest apply still failed.seed-dev'swaitForTenantReadycallsGetTenantProvisioningStatuswhoseOverallStatusis derived fromtenant.status. Withtenant.status='active'stale, seed-dev returned immediately and fired the manifest apply before the worker could act.Evidence
On develop right now (post-PR #2128 merge + deploy):
Deploy Develop CI log, 2026-04-05 08:06:
Code references
services/tenant/worker/provisioning_worker.go:292-w.repo.ListByStatus(ctx, domain.StatusProvisioningPending, ...)- the worker only picks up tenants whosetenant.status = 'provisioning_pending'.services/tenant/service/grpc_provisioning_endpoints.go:151-OverallStatus: s.toProtoStatus(tenant.Status)- seed-dev's wait loop reads this.services/tenant/provisioner/postgres_provisioner.go:262- the"service already provisioned, skipping"short-circuit still needsservice_schemas = '[]'to bypass (kept from PR fix: Skip COMMENT statements in tenant migration rewriter #2128).Test plan
tenant.statusshould flip toprovisioning_pending→provisioning→activebefore seed-dev runsorg_volterra_energyschemas contain actual tables afterwards