Skip to content

perf: investigate DB latency exceeding 500ms threshold for 2 consecutive weeks #1123

@sw-factory-automations

Description

@sw-factory-automations

Description

The health endpoint DB latency has exceeded the 500ms threshold for two consecutive weeks: 799ms in W19, 658ms in W20. The health check already uses best-of-2 sampling with direct fetch (bypassing the Supabase JS client), so the bottleneck is likely on the Supabase infrastructure side — regional distance, connection pool saturation, or cold starts on the PostgREST layer.

Source: metrics/weekly/2026-W20-perf.md

Acceptance Criteria

  • Profile the health endpoint latency breakdown: DNS resolution, TCP/TLS handshake, server processing, response transfer
  • Check Supabase dashboard for connection pool utilization and query performance metrics
  • Determine if the Vercel deployment region and Supabase project region are co-located (same AWS region)
  • If regions differ, document the expected latency floor and whether migration is feasible
  • If regions match, investigate PostgREST cold start behavior and whether connection pooling (PgBouncer/Supavisor) is enabled
  • Achieve DB latency ≤ 500ms on the health endpoint, or document why the threshold should be adjusted
  • pnpm lint && pnpm typecheck && pnpm test pass

Dependencies

None

Technical Notes

  • Health endpoint: src/app/api/health/route.ts — already optimized with direct fetch, best-of-2 sampling, keep-alive headers
  • The health_check RPC function is a simple DB ping — the latency is dominated by network round-trip, not query execution
  • Supabase project config and Vercel deployment region are infrastructure settings, not code changes
  • If the fix requires Supabase region migration or plan changes, document the steps for a maintainer to execute

Approval Required

This is a HIGH risk investigation:

  • May require infrastructure changes (Supabase region migration, connection pooling configuration)
  • Could involve Supabase plan changes with cost implications
  • Region migration would require downtime planning and data migration

Comment "approved" to release this to the automation queue.

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs-humanNeeds human input — automation will re-queue when user respondsperformancepriority:3

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions