feat(duckdb): support version v1.5.3#12009
Conversation
|
tests on current duckdb (1.4.4) passed in CI https://github.com/ibis-project/ibis/actions/runs/26638631127/job/78505291131?pr=12009 |
|
I've pushed lockfile change to use duckdb==1.5.3 Two unit tests fail: https://github.com/ibis-project/ibis/actions/runs/26641555600/job/78515786365?pr=12009 These are due to duckdb/duckdb-spatial#818 |
As per https://ibis-project.org/contribute/06_automated_code_and_ai.html, I think it would be better to rewrite the PR description in your own words (or just move the Copilot stuff to a comment, since it's almost excessive detail for a normal PR description). But, since you've clearly looked into the changes yourself, I think it's fair to say that the requisite human effort is there.
Can you xfail them with comment? Ideally, they may be fixed upstream before next Ibis release. |
|
Thanks @deepyaman for taking time to review and appreciate your comments an the AI point. I'll clean up the PR description Re: the failing tests - I'm hesitant to mark them as xfail as if the duckdb changes aren't addressed before next release that would break functionality for some users? So I think we should either wait for the duckdb changes and merge, or merge now leave the tests as is so that another release is not made until either fixed or we accept that this will be a (very small) breaking change? |
This sounds fine! Leaves time to revisit if and when start hearing about the next Ibis release, in case want to find a workaround at that point. |
This PR fixes issues with using duckdb 1.5.3 with ibis - see related issue here #12008. Note that there is one issue (duckdb/duckdb-spatial#818) in duckdb geospatial that needs fixing before all tests will pass with 1.5.3
Full disclosure: to make this PR I used copilot to run tests and then make changes. I guided it to refine its approach to fixing in some places and also had it check compatibility on duckdb 1.0 and 0.10.3 (minimum version ibis supports).
The changes below are copilot's summary with links to relevant files. References to 1.4.x are for duckdb 1.4.x and earlier.
I've left the duckdb==1.4.4 lock so that CI can show that these changes pass on current locked duckdb version. Once that is proven we can bump to 1.5.3 to run CI again.
Please let me know if any reservations about using copilot for PRs like this.
Thanks!
Description of changes
DuckDB 1.5.3 compatibility
Fixes the ibis DuckDB backend to work correctly with 1.5.3
without version-gated
xfailmarkers in tests.ibis/backends/duckdb/__init__.pyto_pyarrow_batches— replacedfetch_record_batch()with a cross-version branch:fetch_record_batch()was removed (ABI breakage); now usesrel.to_arrow_reader(batch_size=chunk_size)rel.fetch_arrow_reader(chunk_size)(positional arg only)to_pyarrowandexecute— both branch onhasattr(rel, "to_arrow_reader"):rel.to_arrow_reader().read_all()— avoids anInternalExceptionforGEOMETRY('EPSG:xxx')typed columns (upstream: in duckdb==1.5.3, DuckDBPyRelation.to_arrow_table() raises InternalException for GEOMETRY('EPSG:xxx') columns duckdb/duckdb-python#475)rel.to_arrow_table()(unchanged)_to_duckdb_relation— added a DuckDB 1.4.x geometry fixup. In 1.4.x,to_arrow_table()returns DuckDB's internal 20-byte binary format forGEOMETRYcolumns, not WKB — Shapely cannot parse this. When geometry columns are present and
not hasattr(duckdb.DuckDBPyRelation, "to_arrow_reader"), wraps each geometry columnwith
ST_ASWKB(col) AS col. The check is at class level (no throwaway relation created)and the wrapping query is built using sqlglot expressions, not string assembly.
ibis/backends/sql/compilers/duckdb.pyto_sqlglot— reverted to a clean passthrough. A previous version injectedST_ASWKBhere, which mutated user-visible SQL fromcon.compile(), but has consequencethat
to_geo/to_parquet/to_csvoutput were affected, and made SQL snapshots version-dependent.The wrapping now lives exclusively in
_to_duckdb_relation.visit_ArrayStringJoin— new override: in DuckDB 1.5.x,array_to_string([], ',')returns
''instead ofNULL. Fix:IF(len(arg) > 0, array_to_string(arg, sep), NULL).ibis/backends/sql/datatypes.py_from_sqlglot_GEOMETRYand_from_sqlglot_GEOGRAPHY— DuckDB 1.5.x reportsSRID-qualified geometry types as
GEOMETRY('EPSG:2263')(string positional arg) ratherthan an integer SRID. The previous
_geotypes[arg.this.this]lookup raisedKeyError.Now falls back gracefully and parses
EPSG:xxxxstrings as SRIDs.ibis/backends/tests/test_numeric.pytest_numeric_literal/duckdb-decimal-big— the error forDECIMAL(76, 38)changed from
DuckDBParserException(1.4.x) toDuckDBBinderException(1.5.x).Updated to
raises=(DuckDBParserException, DuckDBBinderException).ibis/backends/duckdb/tests/test_client.pytest_pyarrow_batches_chunk_size—chunk_size=-1now raisesTypeErroreagerlyat call time rather than on the first
next(). Test updated accordingly.ibis/backends/duckdb/tests/test_geospatial.pytest_geospatial_buffer— replaced@pytest.mark.xfail_version(fired for anyDuckDB version when shapely>=2.1.0) with a conditional
@pytest.mark.xfailscoped toDuckDB <1.5 + shapely>=2.1.0 only.
Selafin — added
raises=duckdb.Errortono_roundtrip(failure mode changed in1.5.x).
MapML and GeoRSS — crash in DuckDB 1.5.3 with
basic_string: construction from null is not valid(upstream:duckdb/duckdb-spatial#818). Left as plain
param(...)entrieswith
# NOTE:comments and the upstream issue link rather thanxfail, since they passon 1.4.4.
PMTiles —
no_roundtrip(reason="row counts differ", raises=AssertionError).ibis/backends/duckdb/tests/snapshots/test_geospatial/test_literal_geospatial_explicit/expr0/out.sqlandexpr1/out.sqlBoth snapshots simplified from a
SELECT * REPLACE (ST_ASWKB(...)) FROM (...)wrapperback to plain
SELECT ST_GEOMFROMTEXT('POINT (1 0)') AS "p", matching the revertedto_sqlglot.