Skip to content

feat(compat): PostgreSQL builtin-compatibility macros + transforms (48 functions)#706

Merged
EDsCODE merged 19 commits into
mainfrom
fix/pg-builtin-compat-batch
Jun 10, 2026
Merged

feat(compat): PostgreSQL builtin-compatibility macros + transforms (48 functions)#706
EDsCODE merged 19 commits into
mainfrom
fix/pg-builtin-compat-batch

Conversation

@EDsCODE

@EDsCODE EDsCODE commented Jun 8, 2026

Copy link
Copy Markdown
Contributor

Summary

Implements the fixable findings from a PostgreSQL builtin-compatibility audit (modeled on array_lower / #705): PG builtins that real clients/ORMs/metadata queries call but DuckDB either lacks (hard errors) or implements with divergent / silently-wrong semantics. Each fix follows the established pattern — a DuckDB macro/transform wired through server/catalog.go, the transpiler Classify list, and both pgcatalog.go qualification maps — and is gated by TDD against DuckDB v1.5.2 plus an end-to-end harness assertion.

Stacked on #689 (fix/iceberg-pg-tier1-compat) — review/merge that first; this PR reuses its OperatorTransform/looksJSON machinery.

48 functions/operators land here; 7 deferred to a follow-up; 8 documented as infeasible. See docs/pg-builtin-compat-status.md.

What's implemented

  • Batch A — 24 scalar macros: set_config (highest impact — JDBC/SQLAlchemy/psycopg/poolers emit it at connection startup), uuid_generate_v4, statement_timestamp, pg_get_function_arguments/result/identity_arguments, pg_get_triggerdef, pg_jit_available, row_security_active, pg_collation_for, pg_input_is_valid, to_regclass/to_regtype/to_regproc, jsonb_pretty, to_ascii, convert_from, width_bucket, scale, min_scale, masklen, hostmask, set_masklen, inet_same_family.
  • Batch B — 11 array/interval macros: array_positions, array_replace, array_fill, trim_array, array_dims, date_bin, make_interval, justify_hours/days/interval, OVERLAPS.
  • Batch C — bytea correctness: decode/encode 2-arg shadowing macros (DuckDB's builtin decode('YWJj','base64') silently returned the input unchanged; the identity "same" mappings in functions.go made duckgres complicit — removed). inet_server_addr type fix.
  • Batch D — 9 set-returning table macros: json_array_elements, jsonb_array_elements, json_array_elements_text, jsonb_each, json_each_text, pg_options_to_table, aclexplode, pg_get_keywords, pg_identify_object.
  • Batch F — operator rewrites: @> (jsonb containment → json_contains; array @> left native), #>> (literal text[] path → json_extract_string).
  • Batch G — type shim: PG curly-brace array literal casts ('{1,2,3}'::int[]ARRAY[...]::int[]) via a sound parser (quoted/embedded-comma/NULL elements; bails on multi-dim).

Two transpiler bugs caught while wiring the e2e test

  • Classify never ran OperatorTransform for @>/#>> unless the query also contained ->/~/|| — jsonb containment silently hit a binder error.
  • TypeCastTransform didn't recurse into A_Indirection, so casts inside (expr)[n] were missed.

Testing

  • TDD per batch (red→green), every macro verified against DuckDB v1.5.2.
  • go test ./transpiler/... ./server green; go vet clean; golangci-lint 0 issues.
  • e2e: pg_compat_functions in tests/e2e-mw-dev/harness.sh asserts 9 representative functions end-to-end in DuckLake mode on both cnpg + ext backends.

Deferred (Batch E — follow-up PR)

7 functions.go AST transforms with documented PG-divergence corners: format (%I/%L/%s), substr, substring, date_trunc (3-arg), overlay, cardinality, isfinite. Verified specs captured.

Infeasible (documented, intentionally not shipped)

jsonb_set, jsonb_insert, jsonb_strip_nulls, json_populate_record, jsonb_path_query, parse_ident, daterange, numrange — a lossy shim would return silently-wrong data (worse than the current hard error).

🤖 Generated with Claude Code

EDsCODE and others added 7 commits June 5, 2026 13:15
Addresses the highest-severity data-integrity gaps from the Iceberg QA
report (docs/iceberg-pg-syntax-qa.md). All verified live against the
Iceberg backend and covered by unit tests.

- DDL hybrid warn/error (Iceberg only; DuckLake keeps silent-strip for
  sqlmesh/dbt): WARNING for unenforced PK/UNIQUE/CHECK/FK; ERROR (0A000)
  for the silently-NULL features SERIAL/BIGSERIAL, GENERATED ... STORED,
  and DEFAULT <expr>/now(). DEFAULT NULL and NOT NULL preserved. Adds a
  Warnings channel on the transpile Result, surfaced as NoticeResponse in
  both simple and extended protocols.
- EXPLAIN (ANALYZE) of a write no longer double-executes: GetQuerySchema
  returns a synthetic schema for EXPLAIN without running it, and the
  extended-protocol Describe path no longer probe-executes EXPLAIN.
- DROP COLUMN guard: DDL-time WARNING, and the Iceberg "newer schema id"
  scan failure is mapped to a clear 0A000 message instead of a raw XX000.
- bytea hex literals '\xDEADBEEF'::bytea now decode to the correct bytes
  (unhex), and B'101' bit-string literals map to '101'::BIT.
- jsonb || now merges objects (json_merge_patch) instead of silently
  string-concatenating; plain string/array || is left untouched.
- Writable-CTE UPDATE ... RETURNING that reads a modified column is
  rejected (would return pre-update values); RETURNING an unmodified key
  and RETURNING * still work (Airbyte pattern preserved).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds DuckDB macros emulating PG builtins that real clients/ORMs/metadata
queries call but DuckDB lacks, following the array_lower (PR #705) pattern:
macro in catalog.go + Classify token + both pgcatalog qualification maps.

Functions: set_config, uuid_generate_v4, statement_timestamp,
pg_get_function_arguments/result/identity_arguments, pg_get_triggerdef,
pg_jit_available, row_security_active, pg_collation_for, pg_input_is_valid,
to_regclass/to_regtype/to_regproc, jsonb_pretty, to_ascii, convert_from,
width_bucket, scale, min_scale, masklen, hostmask, set_masklen,
inet_same_family.

set_config is the highest-impact (JDBC/SQLAlchemy/psycopg/poolers emit it at
connection startup). inet_merge deferred (needs a covering-CIDR impl DuckDB
lacks primitives for). Verified against DuckDB v1.5.2 via TDD.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
array_positions, array_replace, array_fill, trim_array, array_dims,
date_bin, make_interval, justify_hours/days/interval, and the OVERLAPS
operator (DuckDB parses the keyword to overlaps() but lacks the function).
justify_* use microsecond math to preserve fractional seconds. Verified
against DuckDB v1.5.2 via TDD.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (Batch C)

DuckDB's decode/encode builtins have incompatible 2-arg PG semantics:
decode('YWJj','base64') silently returned the input unchanged (wrong bytes),
encode(bytea,fmt) hit a binder error. The transpiler's identity 'same'
mappings in functions.go made duckgres complicit in the silent corruption.

Replace with 2-arg shadowing macros (base64/hex/escape) and remove the
misleading identity mappings. inet_server_addr now returns an INET-typed NULL
instead of DuckDB's wrong-typed NULL. Regression-tested via TDD.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
json_array_elements, jsonb_array_elements, json_array_elements_text,
jsonb_each, json_each_text, pg_options_to_table, aclexplode,
pg_get_keywords, pg_identify_object as CREATE MACRO ... AS TABLE.
FROM-clause calls are memory.main-qualified in DuckLake mode (RangeFunction
walk in pgcatalog.go). JSONPath keys are quoted so dotted keys work.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ts (Batch F+G)

- @> rewrites to json_contains() when an operand is JSON (array @> stays native).
- #>> rewrites a literal text[] path to json_extract_string(j, '$."a"."b"').
  Both build on the OperatorTransform/looksJSON machinery added in #689.
- '{1,2,3}'::int[] PG array literals are rewritten to ARRAY['1','2','3']::int[]
  via a sound PG-array-literal parser (handles quoted/embedded-comma/NULL
  elements; bails on multi-dimensional). DuckDB cannot parse PG curly literals.

Verified rewrites deparse and execute correctly against DuckDB v1.5.2 via TDD.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y gaps

- harness.sh: pg_compat_functions exercises set_config, width_bucket,
  uuid_generate_v4, encode/decode, jsonb @>/#>>, array-literal cast,
  json_array_elements, make_interval end-to-end in DuckLake mode on both
  cnpg + ext backends (CLAUDE.md e2e gate).
- Classify now flags @> / #>> for OperatorTransform (were only rewritten when a
  query also contained ->/~/||) and flags '}''::' array-literal casts for
  TypeCastTransform.
- TypeCastTransform recurses into A_Indirection so casts inside (expr)[n] are
  transformed. Both gaps were caught wiring the e2e bundle.
- docs/pg-builtin-compat-status.md: implemented / deferred (Batch E) / infeasible.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
EDsCODE and others added 2 commits June 8, 2026 15:03
…VALUES

QA against a live DuckLake server surfaced two bugs that unit tests missed by
bypassing the transpiler:

- The transpiler maps PG inet -> text, so masklen/hostmask/set_masklen/
  inet_same_family (which used family()/host()/::inet) hit a binder error on the
  text arg. Reimplement them on the textual CIDR form (IPv6 detected via ':').
- TypeCastTransform.walkSelectStmt didn't descend into SelectStmt.ValuesLists,
  so '{1,2}'::int[] casts inside INSERT ... VALUES were never rewritten. Add the
  ValuesLists walk.

Adds text-input inet regression cases and an INSERT-VALUES array-cast test.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Base automatically changed from fix/iceberg-pg-tier1-compat to main June 8, 2026 22:30
@EDsCODE EDsCODE marked this pull request as ready for review June 8, 2026 22:31
EDsCODE and others added 10 commits June 8, 2026 15:37
…-batch

# Conflicts:
#	transpiler/transform/literals_test.go
#	transpiler/transform/operators.go
#	transpiler/transpiler.go
#	transpiler/transpiler_test.go
…-batch

# Conflicts:
#	tests/e2e-mw-dev/harness.sh
… divergence note, macro-failure logging

- parsePGArrayLiteral now bails on unquoted backslash escapes ('{a\,b}') so
  the literal reaches DuckDB untouched (hard error) instead of being silently
  mis-split into two elements; regression test added.
- pgPathArrayToJSONPath documents the digit-element divergence (always array
  index; PG resolves object key "0" by runtime container type).
- initPgCatalog macro registration failures are now logged at WARN (the loop
  claimed to log but didn't — a typo'd macro body silently vanished).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Trino revert on main (#712) deleted this helper along with its only
caller, but the merge kept it on this branch; CI lint flags it as unused.
File now matches origin/main exactly.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e comments

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…red alone

functions.go has mapped to_jsonb->to_json for ages, but Classify had no
TO_JSONB( token, so a query whose only PG-ism was to_jsonb skipped the
FunctionTransform and hit DuckDB raw ('Scalar Function with name to_jsonb
does not exist'). Seen in production on a Stripe metadata query. Same latent
class as the @>/#>> gap fixed earlier in this branch.

Adds a regression test asserting mapped functions fire when alone in a query
(jsonb_array_length is covered transitively by the ARRAY_LENGTH( substring).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tic probe sweep

A full-pipeline probe sweep (every mapping/macro/token/transform-path driven
through Classify + transforms + execution against DuckDB v1.5.2) found latent
gaps of the to_jsonb class — machinery that exists but whose gate never fires —
plus rewrite targets that don't resolve. All verifier-reproduced before fixing:

Classify gates (transpiler.go):
- CAST(x AS regclass/regtype/...) spellings (+ ::regoper/::regconfig/
  ::regdictionary) now reach TypeCastTransform; only :: spellings fired before.
- E-string and dollar-quoted bytea hex literals (E'\xDEADBEEF') now hit the
  literal rewrite — previously stored silently-wrong bytes.
- Whitespace normalization: multi-word tokens (FOR UPDATE, SIMILAR TO, DDL),
  SET/SHOW prefix gates, writable-CTE and ON CONFLICT tokens, and
  name-vs-paren spacing no longer require single-space spellings.
- Geometric/range/multirange/timestampntz/rowversion type tokens; bare
  CURRENT_CATALOG/CURRENT_SCHEMA keyword forms; CAST '{...}' AS int[] arrays.
- Parenthesized boolean predicates (flag = (true)) now normalize to IS TRUE.

Transform walkers:
- Generic WalkFunc walkSelectStmt now visits ValuesLists — INSERT ... VALUES
  expressions were invisible to every WalkFunc transform (bytea/bit literals
  corrupted or errored).
- OperatorTransform now walks 9 previously-missed expression positions
  (ValuesLists, RETURNING, USING, FILTER, agg ORDER BY, OVER/window defs,
  ARRAY[...], DISTINCT ON) — an untransformed ~ silently meant bitwise-NOT.
- LockingTransform strips FOR UPDATE/SHARE at any depth (subquery/CTE/set-op).

Catalog surface:
- 17 missing pg_catalog stub relations (pg_user, pg_shadow, pg_authid, pg_cast,
  pg_operator, pg_aggregate, pg_event_trigger, pg_available_extensions,
  pg_timezone_names, pg_stat_database, pg_stat_all_tables,
  pg_replication_slots, pg_db_role_setting, pg_default_acl, pg_range,
  pg_largeobject, pg_cursors) + ViewMappings/Classify wiring.
- array_to_string 3-arg (nullstr) dual-arity macro.
- pg_table_is_visible added to CustomMacros (was never DuckLake-qualified).
- Multirange types in typeMapping; information_schema sequences/routines stubs.

Each fix carries a regression test that was watched RED first (new test files:
classify_wiring_test.go, wiring_ops_test.go, wiring_walk_test.go,
pg_catalog_wiring_test.go).

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-batch

# Conflicts:
#	tests/e2e-mw-dev/harness.sh
#	transpiler/transform/literals_test.go
…-batch

# Conflicts:
#	tests/e2e-mw-dev/harness.sh
…ung every worker warmup

ROOT CAUSE of this PR's deterministic e2e failures (4/4 runs, 'timeout
connecting to worker ... DeadlineExceeded, attempts: 37'):

CAST(NULL AS INET) references a type that lives in DuckDB's inet extension,
which is not statically linked. Creating the macro during initPgCatalog —
which worker warmup runs via ConfigureDBConnection — triggers DuckDB
extension autoinstall, which fetches http://extensions.duckdb.org/... over
plain HTTP port 80. The worker egress policy allows world:443/:5432 only, so
the port-80 SYN is silently dropped and connect() blocks for the full TCP
timeout (~2min, reproduced). The worker's health handler blocks on warmupDone,
the CP's 90s connect budget expires, and the healthy-but-downloading worker is
reaped. Every workers' log stopped right after 'Loaded extension ducklake' —
the next line would have been this macro's registration WARN.

Reproduced with the actual pr-706 worker image from ECR: with no network the
fetch fails fast and warmup completes in 609ms (vs main 75ms); in-cluster the
silent drop turns the same fetch into a multi-minute hang.

Fix: return a VARCHAR-typed NULL (the transpiler maps PG inet to text anyway,
so this is the consistent shape). Add TestInitPgCatalogIsAirgapSafe, which
runs the full catalog init with autoinstall/autoload disabled and fails on
ANY statement that needs a non-static extension — guarding the whole class:
an air-gap-unsafe catalog statement means hung worker warmups in production.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@EDsCODE EDsCODE merged commit 60770e7 into main Jun 10, 2026
24 checks passed
@EDsCODE EDsCODE deleted the fix/pg-builtin-compat-batch branch June 10, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant