Skip to content

Bump duckdb to v1.5.1#300

Open
sfc-gh-abozkurt wants to merge 3 commits into
mainfrom
aykut/bump-duckdb-1.5.1
Open

Bump duckdb to v1.5.1#300
sfc-gh-abozkurt wants to merge 3 commits into
mainfrom
aykut/bump-duckdb-1.5.1

Conversation

@sfc-gh-abozkurt
Copy link
Copy Markdown
Collaborator

@sfc-gh-abozkurt sfc-gh-abozkurt commented Apr 6, 2026

Bump DuckDB to v1.5.1

Version Updates

  • DuckDB: 1.4.41.5.1
  • Python DuckDB package: 1.4.31.5.1
  • DuckDB extension tags updated: httpfs, aws, azure pinned to new commits compatible with 1.5.1

Native GeoParquet / GEOMETRY Support

DuckDB 1.5.1 promotes GEOMETRY to a first-class LogicalTypeId (previously it was a BLOB with a type alias). This required changes across the stack:

pgduck_server — Wire protocol adaptation

  • duckdb_column_type() now returns DUCKDB_TYPE_INVALID for GEOMETRY columns. Two new helpers added via duckdb_pglake:
    • duckdb_pglake_is_geometry_type() — checks whether a logical type is GEOMETRY
    • duckdb_pglake_geometry_get_srid() — extracts SRID from the CRS attached to the type (e.g. EPSG:43264326, OGC:CRS844326)
  • When a GEOMETRY column is detected, it is mapped to BLOB internally so the existing hex-WKB serialization path handles it.
  • SRID injection: A new inject_srid_into_hex_wkb() function transforms ISO WKB hex into EWKB hex by setting the SRID flag and inserting the SRID bytes, so PostGIS can recover the SRID from the wire format.

pg_lake_engine — Read path (read_data.c)

  • Parquet/Iceberg reads now keep GEOMETRY native instead of downgrading to BLOB. DuckDB reads GeoParquet columns natively; for old files without GeoParquet metadata, DuckDB casts BLOBGEOMETRY automatically.
  • CRS stripping via ST_SetCRS(col, ''): Replaces the old ST_GeomFromWKB(col::blob) projection. Stripping CRS is needed because DuckDB 1.5+ requires plain GEOMETRY (without CRS) for implicit POINT_2D casts used by spheroid functions like ST_Distance_Spheroid. SRID preservation relies on PostGIS applying the column typmod.
  • Iceberg schema override: When Iceberg metadata maps a geometry column to binary (per Iceberg spec), but the PostgreSQL column type is geometry, the schema entry is overridden to GEOMETRY so read_parquet() reads it natively.
  • Iceberg fixed[N] type handling: GetSchemaType() now maps fixed[N]binary to avoid DuckDB parse errors.
  • GDAL format: New projection using COALESCE(hex(TRY_CAST(col AS BLOB)), ST_AsHexWKB(TRY_CAST(col AS GEOMETRY))) to handle both BLOB and GEOMETRY sources.

pg_lake_engine — Write path (write_data.c)

  • GeoParquet write with CRS preservation: Instead of ST_AsWKB(col) (which wrote a raw blob), the column is now kept as native GEOMETRY. When the PostgreSQL column carries an SRID (via typmod), ST_SetCRS(col, 'EPSG:<srid>') is applied so DuckDB's GeoParquet writer emits CRS metadata.

DuckDB HTTP Filesystem API Changes

DuckDB 1.5.1 refactored the HTTP filesystem API — PostRequest and PutRequest now take HTTPInput & instead of FileHandle &, and HTTPFSUtil::GetHTTPUtil() returns a reference instead of shared_ptr.

PgLakeS3FileSystem

  • PostRequest / PutRequest signatures updated to accept HTTPInput &.
  • New context registry (RegisterContext / LookupContext) with a mutex-protected map that associates HTTPInput*ClientContext*. This replaces the previous approach of casting FileHandle to access the ClientContext for encryption settings.
  • SetEncryptionFields now takes optional_ptr<ClientContext> directly instead of a handle reference.

RegionAwareS3FileSystem / CachingFileSystem

  • Updated HTTPFSUtil::GetHTTPUtil() usage from -> to . (reference instead of pointer).
  • Added required GetEstimatedCacheMemory() override to cache-related classes.

Glob Handling for Special Characters in Partition Paths

Iceberg partition paths can contain glob characters (e.g., * in directory names like specialChars!@#$%^&*()_+). DuckDB's glob machinery interprets these as wildcards. The caching filesystem now checks whether a path with glob characters refers to an actual existing file first, returning it directly without globbing.

S3 URL Decoding Fix

Common prefixes returned by S3 ListObjectsV2 are now URL-decoded via S3FileSystem::UrlDecode() before being used as paths, fixing issues with encoded characters in S3 prefixes.

Removed enable_geoparquet_conversion Setting

The SET GLOBAL enable_geoparquet_conversion TO 'false' initialization command was removed from pgduck_server, as native GeoParquet support is now the intended behavior.

DuckDB Patches

Patch Status Description
return_stats.patch Removed Parquet stats fixes for decimals and booleans — merged upstream in DuckDB 1.5.1
parquet-virtual-column-stats.patch Added Prevents out-of-bounds crash when getting column statistics for virtual/added columns not present in Parquet row groups
composite-type-resolution.patch Updated Rebased for new duckdb-postgres API (e.g., connection->Query(nullptr, ...))
numeric-nan.patch Updated Rebased for line offsets
snapshot.patch Updated Rebased for new duckdb-postgres API changes

pg_lake Bugs

  • Fixed wrong file_sequence_number in manifest entry
  • Gdal supports multicurve.

Test Updates

  • Error message changes: DuckDB 1.5.1 changed several error messages:
    • "HTTP Error: Unable to connect to URL""HTTP Error:"
    • "NOT FOUND" assertions now use case-insensitive matching (.upper())
    • "Could not establish connection error""Could not connect to server error" / "Could not resolve hostname error"
    • "READ_PARQUET " (trailing space) → "READ_PARQUET" (trimmed)
  • GDAL tests: Added "Unsupported geometry type in WKB" and "Could not find layer" as alternative accepted error messages.
  • pgbench test: Replaced pgbench -i -I t initialization (which relied on DuckDB supporting pgbench's DDL through the t init step) with explicit CREATE TABLE statements (pgbench runs then by adding WITH options that are not supported by duckdb).
  • Caching tests: Double-remove of a cached file now expects a potential 404 error instead of silent success.
  • Test fixture: iceberg_catalog fixture now depends on extension to ensure extensions are created before catalog setup.

@sfc-gh-abozkurt sfc-gh-abozkurt force-pushed the aykut/bump-duckdb-1.5.1 branch 2 times, most recently from 23da792 to c1646f5 Compare April 6, 2026 11:00
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
@sfc-gh-abozkurt sfc-gh-abozkurt force-pushed the aykut/bump-duckdb-1.5.1 branch 2 times, most recently from 612154c to 61133cb Compare April 7, 2026 08:17
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
@sfc-gh-abozkurt sfc-gh-abozkurt force-pushed the aykut/bump-duckdb-1.5.1 branch 2 times, most recently from 07694db to 150fed0 Compare April 7, 2026 09:17
Signed-off-by: Aykut Bozkurt <aykut.bozkurt@snowflake.com>
@sfc-gh-abozkurt sfc-gh-abozkurt force-pushed the aykut/bump-duckdb-1.5.1 branch from 150fed0 to 8876a79 Compare April 7, 2026 10:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant