Skip to content

fix: Postgres ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE under concurrent load (#284)#288

Merged
tombii merged 2 commits into
mainfrom
fix/pg-integer-size-284
Jun 30, 2026
Merged

fix: Postgres ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE under concurrent load (#284)#288
tombii merged 2 commits into
mainfrom
fix/pg-integer-size-284

Conversation

@tombii

@tombii tombii commented Jun 30, 2026

Copy link
Copy Markdown
Owner

Summary

Fixes #284 — Postgres-backed deployments intermittently throw ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE while loading account data.

Root cause: Bun's native Postgres driver has a known concurrency bug class (oven-sh/bun#16774) where queries sharing a pooled connection under concurrent load can misattribute a cached statement's column metadata, corrupting binary integer decoding. This isn't a better-ccflare schema or query bug — confirmed no NUMERIC/DECIMAL columns and no RETURNING clauses are involved in the crash path.

Two mitigations, both agreed as defense-in-depth:

  • Disable named prepared statement caching by default (prepare: false on the Bun SQL client) — unnamed prepared statements don't persist across queries, closing the window for cross-query metadata contamination. Operators can opt back in via BETTER_CCFLARE_DB_PG_PREPARE=true.
  • Retry-once safety net in BunSqlAdapter for the specific ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE error code, scoped to the read paths (query()/get()) since DML (run()/runWithChanges()) never decodes column binary data here (no RETURNING clauses).

Test plan

  • New test file with 4 tests covering: retry succeeds on first failure, retry is not repeated on a second consecutive failure, non-matching errors are not retried — all passing
  • Full packages/database test suite passes (159 existing tests)
  • bun run lint && bun run typecheck && bun run format — clean (pre-existing unrelated warnings only)
  • GitNexus detect_changes confirms LOW risk — change is transparent to all callers (repositories, stats, alerts, handlers), no behavior change for successful queries

…ace (#284)

Bun's native Postgres driver has a known concurrency bug class (oven-sh/bun#16774)
where queries sharing a pooled connection under load can misattribute a cached
statement's column metadata, corrupting binary integer decoding and throwing
ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE. Disable named prepared statement caching
by default (opt back in via BETTER_CCFLARE_DB_PG_PREPARE=true) and add a
retry-once safety net on the read paths (query/get) as a self-healing fallback.
@greptile-apps

greptile-apps Bot commented Jun 30, 2026

Copy link
Copy Markdown

Greptile Summary

This PR mitigates intermittent Postgres integer decode failures under concurrent load. The main changes are:

  • Disables Bun SQL named prepared statements by default.
  • Adds an explicit BETTER_CCFLARE_DB_PG_PREPARE=true opt-in.
  • Retries ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE once on read queries.
  • Adds tests for the retry behavior.

Confidence Score: 5/5

This looks safe to merge.

  • No blocking issues found in the changed code.

Important Files Changed

Filename Overview
packages/database/src/database-operations.ts Adds explicit Postgres prepare-mode handling so named prepared statements are disabled unless BETTER_CCFLARE_DB_PG_PREPARE is exactly true.
packages/database/src/adapters/bun-sql-adapter.ts Adds a retry-once wrapper for ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE and applies it to PostgreSQL read paths.
packages/database/src/adapters/tests/bun-sql-adapter-pg-integer-retry.test.ts Adds Bun tests for successful retry, retry limit, and non-matching error behavior.

Reviews (2): Last reviewed commit: "fix: correct inverted default for BETTER..." | Re-trigger Greptile

Comment thread packages/database/src/database-operations.ts Outdated
@claude

claude Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review: fix: Postgres ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE under concurrent load (#284)

Solid, well-scoped fix. Both mitigations (disabling named prepared statements + retry-once on the specific error code) are reasonable defense-in-depth for a driver-level race, and the change is correctly confined to read paths.

Code quality

  • withPgIntegerSizeRetry is clean and minimal — wraps only query()/get(), leaving run()/runWithChanges()/unsafe() untouched. I verified there are no RETURNING clauses anywhere in packages/database/src/repositories/*.ts or migrations-pg.ts, so the stated invariant ("DML never decodes column data here") holds today.
  • The comment on withPgIntegerSizeRetry (bun-sql-adapter.ts:150-160) is exactly the kind of "why" comment this repo's conventions ask for — explains the upstream bug, why retry-via-new-call instead of re-awaiting, and why it's scoped to reads only.
  • BETTER_CCFLARE_DB_PG_PREPARE env var follows the existing pattern of BETTER_CCFLARE_DB_POOL_MAX/BETTER_CCFLARE_DB_IDLE_TIMEOUT for opt-out knobs, consistent with the rest of database-operations.ts.

Potential issues / things to watch

  1. Silent invariant, not enforced in code (bun-sql-adapter.ts:158-159): The safety of skipping the retry for run()/runWithChanges() depends entirely on "no RETURNING clause exists anywhere." That's true today but nothing would catch a future PR that adds a RETURNING clause to a write query — it would silently lose the retry protection. Not a blocker, but might be worth a one-line note in run()/runWithChanges() itself (not just on withPgIntegerSizeRetry) so the constraint is visible at the call sites too, since that's where someone would add RETURNING.
  2. Doubled worst-case latency on read paths: each retry attempt gets its own fresh PG_CLIENT_QUERY_TIMEOUT_MS (8s) via withPgTimeout, so a query that times out on both the original attempt and the retry could now take up to ~16s before surfacing an error (vs. 8s before). Given this only triggers on the rare driver bug (not on ordinary timeouts, since timeout errors don't carry the matching code), this is probably fine, but worth confirming it doesn't trip any upstream client-side timeout/circuit breaker that assumes ~8s as the ceiling.
  3. BETTER_CCFLARE_DB_PG_PREPARE / BETTER_CCFLARE_DB_STATEMENT_TIMEOUT aren't documented in .env.example or README — this matches the existing (also-undocumented) BETTER_CCFLARE_DB_POOL_MAX/BETTER_CCFLARE_DB_IDLE_TIMEOUT, so it's a pre-existing gap rather than something this PR introduces, but since this is a new escape hatch operators may need to reach for in production, it'd be a good time to document all of these together.

Test coverage

  • The new test file covers the four behaviors called out in the PR description (retry-success, no-double-retry, non-matching error passthrough) for both query() and get(). Good use of a minimal fake unsafe() rather than spinning up a real PG connection.
  • Nit: the fake client only stubs unsafe/on; since withPgIntegerSizeRetry is private, the tests reach it only indirectly through query()/get() — that's the right level (behavior, not implementation), no concern there.
  • No test exercises the "named prepared statements disabled by default" change in database-operations.ts (i.e., asserting prepare: false is passed to SQL(...) unless BETTER_CCFLARE_DB_PG_PREPARE=true). Given that's a one-line config flip with low risk of regressing silently, this is a minor gap rather than a blocker — manual/integration verification is reasonable here since it requires a real PG connection either way.

Security

No concerns — no new user input paths, the retry only re-issues the same already-parameterized query, and the env var is a boolean trust-the-operator switch consistent with existing ones.

Verdict

Looks safe to merge. The two items above (silent RETURNING-clause invariant, and confirming the doubled timeout ceiling is acceptable) are worth a quick look but aren't blocking.

pgPrepare used `!== "true"`, which defaulted to `true` (prepared
statements enabled) whenever the env var was unset, and disabled them
only when explicitly set to "true" — backwards from the intended
"disabled by default, opt in via BETTER_CCFLARE_DB_PG_PREPARE=true".
This left the ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE concurrency path
(#284) reachable under normal Postgres load with no env override set.
@tombii

tombii commented Jun 30, 2026

Copy link
Copy Markdown
Owner Author

Fixed in d60bd19pgPrepare used !== "true", which evaluated to true (prepared statements enabled) whenever the env var was unset, and only disabled them when explicitly set to "true" — exactly backwards from the intended default. Corrected to === "true" so prepared statements are disabled by default (closing the #284 concurrency window) and BETTER_CCFLARE_DB_PG_PREPARE=true now actually opts back in, matching the doc comment and the PR description.

@tombii tombii merged commit c3ce4a4 into main Jun 30, 2026
4 checks passed
@tombii tombii deleted the fix/pg-integer-size-284 branch June 30, 2026 20:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[bug] /context can trigger ERR_POSTGRES_UNSUPPORTED_INTEGER_SIZE

1 participant