You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
GEOMETRY Rework: Part 1 - Logical Type (duckdb#19136)
This PR adds a dedicated `GEOMETRY` logical type into core DuckDB. The
internal representation is currently
[WKB-encoded](https://libgeos.org/specifications/wkb/) `BLOB`, but that
will likely change in a future PR. No functions are implemented for this
type, except to/from `VARCHAR` casts.
This is the first PR in a long series of changes thats going to be
pushed the coming weeks, with the ultimate goal of significantly
elevating DuckDBs geospatial capabilities for the DuckDB v1.5 release in
early 2026.
### Background
So far DuckDB has (mostly) contained all geospatial features in the
`spatial` extension. This has worked great as it has allowed us to
independently and rapidly experiment with how to adapt geospatial
processing to DuckDBs engine, while also keeping a lot of the
domain-specific details and dependencies separate from core DuckDBs core
codebase. Besides integrating a lot of third-party geospatial libraries,
`spatial` has also been integrating deeply into DuckDBs core execution
engine and made itself dependent on a lot of fragile interfaces within
DuckDBs internals (custom operators, optimizer rules, indexes, etc).
Fast forward to today, and `spatial` is one of DuckDBs largest and most
complex extensions.
While this complexity has been somewhat manageable so far, we're now
reaching a point where it's no longer feasible to relegate all
geospatial related stuff into a single separate extension. There's
already some awkwardness when dealing with e.g. Pandas/Postgres/SQLite
through DuckDB, which have their own geospatial extensions (GeoPandas,
PostGIS, GeoPackage/SpatiaLite) as core DuckDB doesn't really want to
acknowledge anything spatial specific and we don't want to introduce
inter-extension dependencies either. But geospatial support is now also
part of the parquet standard itself, which is of much higher importance
to DuckDB, as well as supported by all up and coming data lake formats.
In short: [Geospatial data Isn't special
(anymore)](https://forrest.nyc/why-cloud-native-geospatial-data-is-making-spatial-just-data-again/)
### What's changing
Therefore we're taking some steps to making vanilla DuckDB spatial
aware, by moving the `GEOMETRY` type from `spatial` into core DuckDB.
While almost all of the geospatial functionality will still remain in
`spatial` (e.g. 99% of `ST_` functions), this will give our (and
community!) extensions and client libraries some common ground as they
can all interface with the same `GEOMETRY` type. We will also make sure
that existing databases that use the `GEOMETRY` type as currently
defined in `spatial` will remain compatible.
Additionally, because `GEOMETRY` will now become part of both DuckDBs
execution and storage engine, this opens up a lot of optimization
opportunities that are currently impractical/impossible to implement
solely in `spatial`. The two big ones being __statistics propagation__
and __compression__, which will significantly improve performance of
processing both external formats like (Geo)Parquet and DuckDBs own
storage format.
Again, this is a pretty massive change. I have prototyped most of it on
my own fork(s), but will break it up into multiple PR's keep it
manageable. The rough short-term roadmap looks something like this:
- [x] Add the `GEOMETRY` to core
- [x] Add geometry statistics support (Implemented in duckdb#19203)
- [ ] Add geometry filter pushdown (using the statistics!)
- [ ] Fixup `parquet` extension
- [ ] Fixup `spatial` extension
- [ ] Dedicated geometry storage/compression
- [ ] Type-level CRS tracking/management
- [ ] Maybe `GEOGRAPHY`/"vectorized types" too
Client/other extension integrations etc, etc, is planned to get in
before 1.5 as well, but we will first focus on core/parquet/spatial.
This PR also removes spatial from the CI workflow until I've had time
adapt it to these changes, but that will hopefully not take too long (I
have an old branch with most of the work already)
0 commit comments