Skip to content

fix(coprocessor): reduce 0.12 migration lock time#2365

Draft
Eikix wants to merge 2 commits intorelease/0.12.xfrom
elias/fix-012-db-migration-performance
Draft

fix(coprocessor): reduce 0.12 migration lock time#2365
Eikix wants to merge 2 commits intorelease/0.12.xfrom
elias/fix-012-db-migration-performance

Conversation

@Eikix
Copy link
Copy Markdown
Contributor

@Eikix Eikix commented Apr 24, 2026

What

Backport a safer 0.12 coprocessor DB migration for large testnet-sized databases.

Changes:

  • reject tenant-owned rows whose tenant_id does not match the single tenant before deriving host-chain/key defaults
  • add host_chain_id / key_id_gw as NOT NULL DEFAULT constants instead of full-table backfills
  • add host-chain CHECK constraints as NOT VALID
  • split no-tenant unique indexes into separate -- no-transaction migrations using retry-safe DROP INDEX CONCURRENTLY IF EXISTS + CREATE UNIQUE INDEX CONCURRENTLY
  • update migration tests to apply the split migration sequence and cover mismatched tenant rows

Why

The previous migration performed non-concurrent index builds, full-table updates, SET NOT NULL, and constraint validation inside one long migration. On the testnet snapshot this held strong locks for hours and generated heavy IO/WAL pressure.

This keeps the physical index work, but avoids the full-table rewrites for host_chain_id / key_id_gw and moves unique-index builds into concurrent index migrations.

Operational caveats

  • This changes SQLx migration 20260128095635. Any environment that already applied the previous 0.12.x version must be reset/rebased or have explicit _sqlx_migrations handling before running this image.
  • The split index migrations still need to build indexes over large tables. They use CREATE UNIQUE INDEX CONCURRENTLY, so they avoid long ACCESS EXCLUSIVE table locks but can still wait behind long-running transactions.
  • If a concurrent index build is interrupted before SQLx records success, rerunning the migration drops the leftover index relation first and recreates it.
  • The new CHECK constraints are NOT VALID. Existing rows receive non-negative constant defaults in this migration; constraint validation can be done in a follow-up migration if needed.

Validation

  • cargo fmt --check
  • pre-commit cargo check
  • pre-commit clippy
  • cargo-audit audit --ignore RUSTSEC-2026-0098 --ignore RUSTSEC-2026-0099 --ignore RUSTSEC-2026-0104
  • cargo update -w --locked
  • SQLX_OFFLINE=true cargo check -p tfhe-worker
  • SQLX_OFFLINE=true cargo test -p tfhe-worker migrations -- --nocapture compiled, then failed because the local Docker/Postgres test harness was unavailable (Connection refused).

@cla-bot cla-bot Bot added the cla-signed label Apr 24, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 24, 2026

Changed Lines Coverage

Coverage of added/modified lines: N/A

Per-file breakdown

Diff Coverage

Diff: origin/release/0.12.x...HEAD, staged and unstaged changes

No lines with coverage information in this diff.

@Eikix Eikix force-pushed the elias/fix-012-db-migration-performance branch from 5000745 to 2f9a986 Compare April 24, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant