feat(compat): PostgreSQL builtin-compatibility macros + transforms (48 functions)#706
Merged
Conversation
Addresses the highest-severity data-integrity gaps from the Iceberg QA report (docs/iceberg-pg-syntax-qa.md). All verified live against the Iceberg backend and covered by unit tests. - DDL hybrid warn/error (Iceberg only; DuckLake keeps silent-strip for sqlmesh/dbt): WARNING for unenforced PK/UNIQUE/CHECK/FK; ERROR (0A000) for the silently-NULL features SERIAL/BIGSERIAL, GENERATED ... STORED, and DEFAULT <expr>/now(). DEFAULT NULL and NOT NULL preserved. Adds a Warnings channel on the transpile Result, surfaced as NoticeResponse in both simple and extended protocols. - EXPLAIN (ANALYZE) of a write no longer double-executes: GetQuerySchema returns a synthetic schema for EXPLAIN without running it, and the extended-protocol Describe path no longer probe-executes EXPLAIN. - DROP COLUMN guard: DDL-time WARNING, and the Iceberg "newer schema id" scan failure is mapped to a clear 0A000 message instead of a raw XX000. - bytea hex literals '\xDEADBEEF'::bytea now decode to the correct bytes (unhex), and B'101' bit-string literals map to '101'::BIT. - jsonb || now merges objects (json_merge_patch) instead of silently string-concatenating; plain string/array || is left untouched. - Writable-CTE UPDATE ... RETURNING that reads a modified column is rejected (would return pre-update values); RETURNING an unmodified key and RETURNING * still work (Airbyte pattern preserved). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds DuckDB macros emulating PG builtins that real clients/ORMs/metadata queries call but DuckDB lacks, following the array_lower (PR #705) pattern: macro in catalog.go + Classify token + both pgcatalog qualification maps. Functions: set_config, uuid_generate_v4, statement_timestamp, pg_get_function_arguments/result/identity_arguments, pg_get_triggerdef, pg_jit_available, row_security_active, pg_collation_for, pg_input_is_valid, to_regclass/to_regtype/to_regproc, jsonb_pretty, to_ascii, convert_from, width_bucket, scale, min_scale, masklen, hostmask, set_masklen, inet_same_family. set_config is the highest-impact (JDBC/SQLAlchemy/psycopg/poolers emit it at connection startup). inet_merge deferred (needs a covering-CIDR impl DuckDB lacks primitives for). Verified against DuckDB v1.5.2 via TDD. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
array_positions, array_replace, array_fill, trim_array, array_dims, date_bin, make_interval, justify_hours/days/interval, and the OVERLAPS operator (DuckDB parses the keyword to overlaps() but lacks the function). justify_* use microsecond math to preserve fractional seconds. Verified against DuckDB v1.5.2 via TDD. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… (Batch C)
DuckDB's decode/encode builtins have incompatible 2-arg PG semantics:
decode('YWJj','base64') silently returned the input unchanged (wrong bytes),
encode(bytea,fmt) hit a binder error. The transpiler's identity 'same'
mappings in functions.go made duckgres complicit in the silent corruption.
Replace with 2-arg shadowing macros (base64/hex/escape) and remove the
misleading identity mappings. inet_server_addr now returns an INET-typed NULL
instead of DuckDB's wrong-typed NULL. Regression-tested via TDD.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
json_array_elements, jsonb_array_elements, json_array_elements_text, jsonb_each, json_each_text, pg_options_to_table, aclexplode, pg_get_keywords, pg_identify_object as CREATE MACRO ... AS TABLE. FROM-clause calls are memory.main-qualified in DuckLake mode (RangeFunction walk in pgcatalog.go). JSONPath keys are quoted so dotted keys work. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ts (Batch F+G) - @> rewrites to json_contains() when an operand is JSON (array @> stays native). - #>> rewrites a literal text[] path to json_extract_string(j, '$."a"."b"'). Both build on the OperatorTransform/looksJSON machinery added in #689. - '{1,2,3}'::int[] PG array literals are rewritten to ARRAY['1','2','3']::int[] via a sound PG-array-literal parser (handles quoted/embedded-comma/NULL elements; bails on multi-dimensional). DuckDB cannot parse PG curly literals. Verified rewrites deparse and execute correctly against DuckDB v1.5.2 via TDD. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y gaps - harness.sh: pg_compat_functions exercises set_config, width_bucket, uuid_generate_v4, encode/decode, jsonb @>/#>>, array-literal cast, json_array_elements, make_interval end-to-end in DuckLake mode on both cnpg + ext backends (CLAUDE.md e2e gate). - Classify now flags @> / #>> for OperatorTransform (were only rewritten when a query also contained ->/~/||) and flags '}''::' array-literal casts for TypeCastTransform. - TypeCastTransform recurses into A_Indirection so casts inside (expr)[n] are transformed. Both gaps were caught wiring the e2e bundle. - docs/pg-builtin-compat-status.md: implemented / deferred (Batch E) / infeasible. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…VALUES
QA against a live DuckLake server surfaced two bugs that unit tests missed by
bypassing the transpiler:
- The transpiler maps PG inet -> text, so masklen/hostmask/set_masklen/
inet_same_family (which used family()/host()/::inet) hit a binder error on the
text arg. Reimplement them on the textual CIDR form (IPv6 detected via ':').
- TypeCastTransform.walkSelectStmt didn't descend into SelectStmt.ValuesLists,
so '{1,2}'::int[] casts inside INSERT ... VALUES were never rewritten. Add the
ValuesLists walk.
Adds text-input inet regression cases and an INSERT-VALUES array-cast test.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…on (from live QA)
…-batch # Conflicts: # transpiler/transform/literals_test.go # transpiler/transform/operators.go # transpiler/transpiler.go # transpiler/transpiler_test.go
…-batch # Conflicts: # tests/e2e-mw-dev/harness.sh
… divergence note, macro-failure logging
- parsePGArrayLiteral now bails on unquoted backslash escapes ('{a\,b}') so
the literal reaches DuckDB untouched (hard error) instead of being silently
mis-split into two elements; regression test added.
- pgPathArrayToJSONPath documents the digit-element divergence (always array
index; PG resolves object key "0" by runtime container type).
- initPgCatalog macro registration failures are now logged at WARN (the loop
claimed to log but didn't — a typo'd macro body silently vanished).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The Trino revert on main (#712) deleted this helper along with its only caller, but the merge kept it on this branch; CI lint flags it as unused. File now matches origin/main exactly. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…e comments Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…red alone
functions.go has mapped to_jsonb->to_json for ages, but Classify had no
TO_JSONB( token, so a query whose only PG-ism was to_jsonb skipped the
FunctionTransform and hit DuckDB raw ('Scalar Function with name to_jsonb
does not exist'). Seen in production on a Stripe metadata query. Same latent
class as the @>/#>> gap fixed earlier in this branch.
Adds a regression test asserting mapped functions fire when alone in a query
(jsonb_array_length is covered transitively by the ARRAY_LENGTH( substring).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…tic probe sweep
A full-pipeline probe sweep (every mapping/macro/token/transform-path driven
through Classify + transforms + execution against DuckDB v1.5.2) found latent
gaps of the to_jsonb class — machinery that exists but whose gate never fires —
plus rewrite targets that don't resolve. All verifier-reproduced before fixing:
Classify gates (transpiler.go):
- CAST(x AS regclass/regtype/...) spellings (+ ::regoper/::regconfig/
::regdictionary) now reach TypeCastTransform; only :: spellings fired before.
- E-string and dollar-quoted bytea hex literals (E'\xDEADBEEF') now hit the
literal rewrite — previously stored silently-wrong bytes.
- Whitespace normalization: multi-word tokens (FOR UPDATE, SIMILAR TO, DDL),
SET/SHOW prefix gates, writable-CTE and ON CONFLICT tokens, and
name-vs-paren spacing no longer require single-space spellings.
- Geometric/range/multirange/timestampntz/rowversion type tokens; bare
CURRENT_CATALOG/CURRENT_SCHEMA keyword forms; CAST '{...}' AS int[] arrays.
- Parenthesized boolean predicates (flag = (true)) now normalize to IS TRUE.
Transform walkers:
- Generic WalkFunc walkSelectStmt now visits ValuesLists — INSERT ... VALUES
expressions were invisible to every WalkFunc transform (bytea/bit literals
corrupted or errored).
- OperatorTransform now walks 9 previously-missed expression positions
(ValuesLists, RETURNING, USING, FILTER, agg ORDER BY, OVER/window defs,
ARRAY[...], DISTINCT ON) — an untransformed ~ silently meant bitwise-NOT.
- LockingTransform strips FOR UPDATE/SHARE at any depth (subquery/CTE/set-op).
Catalog surface:
- 17 missing pg_catalog stub relations (pg_user, pg_shadow, pg_authid, pg_cast,
pg_operator, pg_aggregate, pg_event_trigger, pg_available_extensions,
pg_timezone_names, pg_stat_database, pg_stat_all_tables,
pg_replication_slots, pg_db_role_setting, pg_default_acl, pg_range,
pg_largeobject, pg_cursors) + ViewMappings/Classify wiring.
- array_to_string 3-arg (nullstr) dual-arity macro.
- pg_table_is_visible added to CustomMacros (was never DuckLake-qualified).
- Multirange types in typeMapping; information_schema sequences/routines stubs.
Each fix carries a regression test that was watched RED first (new test files:
classify_wiring_test.go, wiring_ops_test.go, wiring_walk_test.go,
pg_catalog_wiring_test.go).
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…-batch # Conflicts: # tests/e2e-mw-dev/harness.sh # transpiler/transform/literals_test.go
…-batch # Conflicts: # tests/e2e-mw-dev/harness.sh
…ung every worker warmup ROOT CAUSE of this PR's deterministic e2e failures (4/4 runs, 'timeout connecting to worker ... DeadlineExceeded, attempts: 37'): CAST(NULL AS INET) references a type that lives in DuckDB's inet extension, which is not statically linked. Creating the macro during initPgCatalog — which worker warmup runs via ConfigureDBConnection — triggers DuckDB extension autoinstall, which fetches http://extensions.duckdb.org/... over plain HTTP port 80. The worker egress policy allows world:443/:5432 only, so the port-80 SYN is silently dropped and connect() blocks for the full TCP timeout (~2min, reproduced). The worker's health handler blocks on warmupDone, the CP's 90s connect budget expires, and the healthy-but-downloading worker is reaped. Every workers' log stopped right after 'Loaded extension ducklake' — the next line would have been this macro's registration WARN. Reproduced with the actual pr-706 worker image from ECR: with no network the fetch fails fast and warmup completes in 609ms (vs main 75ms); in-cluster the silent drop turns the same fetch into a multi-minute hang. Fix: return a VARCHAR-typed NULL (the transpiler maps PG inet to text anyway, so this is the consistent shape). Add TestInitPgCatalogIsAirgapSafe, which runs the full catalog init with autoinstall/autoload disabled and fails on ANY statement that needs a non-static extension — guarding the whole class: an air-gap-unsafe catalog statement means hung worker warmups in production. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Implements the fixable findings from a PostgreSQL builtin-compatibility audit (modeled on
array_lower/ #705): PG builtins that real clients/ORMs/metadata queries call but DuckDB either lacks (hard errors) or implements with divergent / silently-wrong semantics. Each fix follows the established pattern — a DuckDB macro/transform wired throughserver/catalog.go, the transpilerClassifylist, and bothpgcatalog.goqualification maps — and is gated by TDD against DuckDB v1.5.2 plus an end-to-end harness assertion.Stacked on #689 (
fix/iceberg-pg-tier1-compat) — review/merge that first; this PR reuses itsOperatorTransform/looksJSONmachinery.48 functions/operators land here; 7 deferred to a follow-up; 8 documented as infeasible. See
docs/pg-builtin-compat-status.md.What's implemented
set_config(highest impact — JDBC/SQLAlchemy/psycopg/poolers emit it at connection startup),uuid_generate_v4,statement_timestamp,pg_get_function_arguments/result/identity_arguments,pg_get_triggerdef,pg_jit_available,row_security_active,pg_collation_for,pg_input_is_valid,to_regclass/to_regtype/to_regproc,jsonb_pretty,to_ascii,convert_from,width_bucket,scale,min_scale,masklen,hostmask,set_masklen,inet_same_family.array_positions,array_replace,array_fill,trim_array,array_dims,date_bin,make_interval,justify_hours/days/interval,OVERLAPS.decode/encode2-arg shadowing macros (DuckDB's builtindecode('YWJj','base64')silently returned the input unchanged; the identity"same"mappings infunctions.gomade duckgres complicit — removed).inet_server_addrtype fix.json_array_elements,jsonb_array_elements,json_array_elements_text,jsonb_each,json_each_text,pg_options_to_table,aclexplode,pg_get_keywords,pg_identify_object.@>(jsonb containment →json_contains; array@>left native),#>>(literaltext[]path →json_extract_string).'{1,2,3}'::int[]→ARRAY[...]::int[]) via a sound parser (quoted/embedded-comma/NULL elements; bails on multi-dim).Two transpiler bugs caught while wiring the e2e test
Classifynever ranOperatorTransformfor@>/#>>unless the query also contained->/~/||— jsonb containment silently hit a binder error.TypeCastTransformdidn't recurse intoA_Indirection, so casts inside(expr)[n]were missed.Testing
go test ./transpiler/... ./servergreen;go vetclean;golangci-lint0 issues.pg_compat_functionsintests/e2e-mw-dev/harness.shasserts 9 representative functions end-to-end in DuckLake mode on both cnpg + ext backends.Deferred (Batch E — follow-up PR)
7
functions.goAST transforms with documented PG-divergence corners:format(%I/%L/%s),substr,substring,date_trunc(3-arg),overlay,cardinality,isfinite. Verified specs captured.Infeasible (documented, intentionally not shipped)
jsonb_set,jsonb_insert,jsonb_strip_nulls,json_populate_record,jsonb_path_query,parse_ident,daterange,numrange— a lossy shim would return silently-wrong data (worse than the current hard error).🤖 Generated with Claude Code