fix(ci): idempotent inventory migration + self-healing migrate-production (#245)#258
Conversation
…isions (#245) The "Migrate production database" workflow failed red on main (issue #245, commit 3ce0f13, Phase 2/#244). Migration 20260612000000_inventory_stock_movements used bare CREATE TYPE / ADD COLUMN / CREATE TABLE, which collide on a database historically synced with `prisma db push` (production was). The collision surfaced as a Postgres "already exists" error on the FIRST `prisma migrate deploy`, exiting non-zero with the raw DB error rather than the P3018 code; the workflow's retry loop only matched P3018, so it bailed. (A later merge saw the now-blocked queue as P3018, auto-resolved it, and unblocked deploy — so prod self-healed and is migrated/seeded, but the SQL and the workflow were fragile.) - prisma: the inventory migration is now idempotent — enum + FK constraints guarded by DO $$ … IF NOT EXISTS … $$, column/table/index use IF NOT EXISTS, so a replay against a db-push'd or clean DB is a no-op where objects exist. Final schema is unchanged; deploy does not re-checksum applied migrations. - migrate-production.yml: the self-heal loop now also matches the first-pass collision class (already exists / 42P07 / 42710 / 42P06 / 42701), resolves the failed migration as applied, and retries — so this failure class can't go red again. Comments updated to reflect the two self-healing cases. - docs: KI-028 records the root cause + fix; owner action remains "keep the PRODUCTION_DATABASE_URL repo secret set" (no secret → workflow still SKIPS green). Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 4c00155. Configure here.
| # "already exists" collision on a db-push'd database. | ||
| COLLISION="" | ||
| if grep -qiE "already exists|42P07|42710|42P06|42701" /tmp/migrate-deploy.log; then | ||
| COLLISION="1" |
There was a problem hiding this comment.
SQLSTATE grep matches migration timestamps
Medium Severity
The self-heal branch treats a failed prisma migrate deploy as a db-push collision when the log matches 42710 or 42701 anywhere. Those five-digit sequences also appear inside 14-digit migration folder timestamps in normal “Applying migration …” lines, so an unrelated failure (syntax, connectivity, constraint) can still set COLLISION and run migrate resolve --applied on the wrong migration.
Reviewed by Cursor Bugbot for commit 4c00155. Configure here.


Summary
Resolves #245 — the Migrate production database workflow failed red on
main(commit3ce0f13, Phase 2 / #244).Root cause. Migration
20260612000000_inventory_stock_movementsused bareCREATE TYPE/ADD COLUMN/CREATE TABLE. On a database historically synced withprisma db push(production was — see KI-018), those objects already exist, so the firstprisma migrate deployhit a Postgres "already exists" error and exited non-zero with the raw DB error, not theP3018code. The workflow's self-heal retry loop only matchedP3018, so it bailed on attempt 1 ("migrate deploy failed for an unexpected reason").Why prod is already healthy. A later merge to
mainsaw the now-blocked migration queue asP3018, auto-resolved it, and unblockeddeploy— the most recent runs are green and production is migrated + seeded. This PR removes the fragility so the failure class can't recur, and makes a clean replay/restore safe.Changes
prisma/.../20260612000000_inventory_stock_movements/migration.sql— now idempotent: enum + FK constraints guarded withDO $$ … IF NOT EXISTS … $$, column/table/indexes useIF NOT EXISTS. Final schema is identical;prisma migrate deploydoes not re-checksum already-applied migrations, so this is safe on the already-migrated production DB..github/workflows/migrate-production.yml— the self-heal loop now also recognises the first-pass collision class (already exists/42P07/42710/42P06/42701), resolves the offending migration as applied, and retries. The graceful skip-when-secret-absent guard is unchanged (noPRODUCTION_DATABASE_URL→ workflow SKIPS green).docs/KNOWN_ISSUES.md— adds KI-028 with the root cause + fix.Owner action
No new secret required. Keep the existing
PRODUCTION_DATABASE_URLrepo secret set (Settings → Secrets and variables → Actions). Without it the workflow skips green by design; forks/preview setups stay green.Testing
mainruns aresuccess/skipped(no other red workflows).P3018collision the old loop couldn't catch.check/docker/security/ GitGuardian) green on this PR.maintouchingprisma/runs Migrate production database green (verifies the hardened loop against the live DB).🤖 Generated with Claude Code