Skip to content

Commit 90690af

Browse files
authored
GEOMETRY Rework: Part 1 - Logical Type (duckdb#19136)
This PR adds a dedicated `GEOMETRY` logical type into core DuckDB. The internal representation is currently [WKB-encoded](https://libgeos.org/specifications/wkb/) `BLOB`, but that will likely change in a future PR. No functions are implemented for this type, except to/from `VARCHAR` casts. This is the first PR in a long series of changes thats going to be pushed the coming weeks, with the ultimate goal of significantly elevating DuckDBs geospatial capabilities for the DuckDB v1.5 release in early 2026. ### Background So far DuckDB has (mostly) contained all geospatial features in the `spatial` extension. This has worked great as it has allowed us to independently and rapidly experiment with how to adapt geospatial processing to DuckDBs engine, while also keeping a lot of the domain-specific details and dependencies separate from core DuckDBs core codebase. Besides integrating a lot of third-party geospatial libraries, `spatial` has also been integrating deeply into DuckDBs core execution engine and made itself dependent on a lot of fragile interfaces within DuckDBs internals (custom operators, optimizer rules, indexes, etc). Fast forward to today, and `spatial` is one of DuckDBs largest and most complex extensions. While this complexity has been somewhat manageable so far, we're now reaching a point where it's no longer feasible to relegate all geospatial related stuff into a single separate extension. There's already some awkwardness when dealing with e.g. Pandas/Postgres/SQLite through DuckDB, which have their own geospatial extensions (GeoPandas, PostGIS, GeoPackage/SpatiaLite) as core DuckDB doesn't really want to acknowledge anything spatial specific and we don't want to introduce inter-extension dependencies either. But geospatial support is now also part of the parquet standard itself, which is of much higher importance to DuckDB, as well as supported by all up and coming data lake formats. In short: [Geospatial data Isn't special (anymore)](https://forrest.nyc/why-cloud-native-geospatial-data-is-making-spatial-just-data-again/) ### What's changing Therefore we're taking some steps to making vanilla DuckDB spatial aware, by moving the `GEOMETRY` type from `spatial` into core DuckDB. While almost all of the geospatial functionality will still remain in `spatial` (e.g. 99% of `ST_` functions), this will give our (and community!) extensions and client libraries some common ground as they can all interface with the same `GEOMETRY` type. We will also make sure that existing databases that use the `GEOMETRY` type as currently defined in `spatial` will remain compatible. Additionally, because `GEOMETRY` will now become part of both DuckDBs execution and storage engine, this opens up a lot of optimization opportunities that are currently impractical/impossible to implement solely in `spatial`. The two big ones being __statistics propagation__ and __compression__, which will significantly improve performance of processing both external formats like (Geo)Parquet and DuckDBs own storage format. Again, this is a pretty massive change. I have prototyped most of it on my own fork(s), but will break it up into multiple PR's keep it manageable. The rough short-term roadmap looks something like this: - [x] Add the `GEOMETRY` to core - [x] Add geometry statistics support (Implemented in duckdb#19203) - [ ] Add geometry filter pushdown (using the statistics!) - [ ] Fixup `parquet` extension - [ ] Fixup `spatial` extension - [ ] Dedicated geometry storage/compression - [ ] Type-level CRS tracking/management - [ ] Maybe `GEOGRAPHY`/"vectorized types" too Client/other extension integrations etc, etc, is planned to get in before 1.5 as well, but we will first focus on core/parquet/spatial. This PR also removes spatial from the CI workflow until I've had time adapt it to these changes, but that will hopefully not take too long (I have an old branch with most of the work already)
2 parents f635c78 + fe1a528 commit 90690af

File tree

30 files changed

+1134
-13
lines changed

30 files changed

+1134
-13
lines changed

.github/config/out_of_tree_extensions.cmake

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@ include("${EXTENSION_CONFIG_BASE_DIR}/iceberg.cmake")
2727
include("${EXTENSION_CONFIG_BASE_DIR}/inet.cmake")
2828
include("${EXTENSION_CONFIG_BASE_DIR}/mysql_scanner.cmake")
2929
include("${EXTENSION_CONFIG_BASE_DIR}/postgres_scanner.cmake")
30-
include("${EXTENSION_CONFIG_BASE_DIR}/spatial.cmake")
30+
# include("${EXTENSION_CONFIG_BASE_DIR}/spatial.cmake") Remove spatial until the geometry refactor is done
3131
include("${EXTENSION_CONFIG_BASE_DIR}/sqlite_scanner.cmake")
3232
include("${EXTENSION_CONFIG_BASE_DIR}/sqlsmith.cmake")
3333
include("${EXTENSION_CONFIG_BASE_DIR}/vss.cmake")

src/common/enum_util.cpp

Lines changed: 35 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,7 @@
8989
#include "duckdb/common/types/column/partitioned_column_data.hpp"
9090
#include "duckdb/common/types/conflict_manager.hpp"
9191
#include "duckdb/common/types/date.hpp"
92+
#include "duckdb/common/types/geometry.hpp"
9293
#include "duckdb/common/types/hyperloglog.hpp"
9394
#include "duckdb/common/types/row/block_iterator.hpp"
9495
#include "duckdb/common/types/row/partitioned_tuple_data.hpp"
@@ -1795,19 +1796,20 @@ const StringUtil::EnumStringLiteral *GetExtraTypeInfoTypeValues() {
17951796
{ static_cast<uint32_t>(ExtraTypeInfoType::ARRAY_TYPE_INFO), "ARRAY_TYPE_INFO" },
17961797
{ static_cast<uint32_t>(ExtraTypeInfoType::ANY_TYPE_INFO), "ANY_TYPE_INFO" },
17971798
{ static_cast<uint32_t>(ExtraTypeInfoType::INTEGER_LITERAL_TYPE_INFO), "INTEGER_LITERAL_TYPE_INFO" },
1798-
{ static_cast<uint32_t>(ExtraTypeInfoType::TEMPLATE_TYPE_INFO), "TEMPLATE_TYPE_INFO" }
1799+
{ static_cast<uint32_t>(ExtraTypeInfoType::TEMPLATE_TYPE_INFO), "TEMPLATE_TYPE_INFO" },
1800+
{ static_cast<uint32_t>(ExtraTypeInfoType::GEO_TYPE_INFO), "GEO_TYPE_INFO" }
17991801
};
18001802
return values;
18011803
}
18021804

18031805
template<>
18041806
const char* EnumUtil::ToChars<ExtraTypeInfoType>(ExtraTypeInfoType value) {
1805-
return StringUtil::EnumToString(GetExtraTypeInfoTypeValues(), 13, "ExtraTypeInfoType", static_cast<uint32_t>(value));
1807+
return StringUtil::EnumToString(GetExtraTypeInfoTypeValues(), 14, "ExtraTypeInfoType", static_cast<uint32_t>(value));
18061808
}
18071809

18081810
template<>
18091811
ExtraTypeInfoType EnumUtil::FromString<ExtraTypeInfoType>(const char *value) {
1810-
return static_cast<ExtraTypeInfoType>(StringUtil::StringToEnum(GetExtraTypeInfoTypeValues(), 13, "ExtraTypeInfoType", value));
1812+
return static_cast<ExtraTypeInfoType>(StringUtil::StringToEnum(GetExtraTypeInfoTypeValues(), 14, "ExtraTypeInfoType", value));
18111813
}
18121814

18131815
const StringUtil::EnumStringLiteral *GetFileBufferTypeValues() {
@@ -2059,6 +2061,30 @@ GateStatus EnumUtil::FromString<GateStatus>(const char *value) {
20592061
return static_cast<GateStatus>(StringUtil::StringToEnum(GetGateStatusValues(), 2, "GateStatus", value));
20602062
}
20612063

2064+
const StringUtil::EnumStringLiteral *GetGeometryTypeValues() {
2065+
static constexpr StringUtil::EnumStringLiteral values[] {
2066+
{ static_cast<uint32_t>(GeometryType::INVALID), "INVALID" },
2067+
{ static_cast<uint32_t>(GeometryType::POINT), "POINT" },
2068+
{ static_cast<uint32_t>(GeometryType::LINESTRING), "LINESTRING" },
2069+
{ static_cast<uint32_t>(GeometryType::POLYGON), "POLYGON" },
2070+
{ static_cast<uint32_t>(GeometryType::MULTIPOINT), "MULTIPOINT" },
2071+
{ static_cast<uint32_t>(GeometryType::MULTILINESTRING), "MULTILINESTRING" },
2072+
{ static_cast<uint32_t>(GeometryType::MULTIPOLYGON), "MULTIPOLYGON" },
2073+
{ static_cast<uint32_t>(GeometryType::GEOMETRYCOLLECTION), "GEOMETRYCOLLECTION" }
2074+
};
2075+
return values;
2076+
}
2077+
2078+
template<>
2079+
const char* EnumUtil::ToChars<GeometryType>(GeometryType value) {
2080+
return StringUtil::EnumToString(GetGeometryTypeValues(), 8, "GeometryType", static_cast<uint32_t>(value));
2081+
}
2082+
2083+
template<>
2084+
GeometryType EnumUtil::FromString<GeometryType>(const char *value) {
2085+
return static_cast<GeometryType>(StringUtil::StringToEnum(GetGeometryTypeValues(), 8, "GeometryType", value));
2086+
}
2087+
20622088
const StringUtil::EnumStringLiteral *GetHLLStorageTypeValues() {
20632089
static constexpr StringUtil::EnumStringLiteral values[] {
20642090
{ static_cast<uint32_t>(HLLStorageType::HLL_V1), "HLL_V1" },
@@ -2599,6 +2625,7 @@ const StringUtil::EnumStringLiteral *GetLogicalTypeIdValues() {
25992625
{ static_cast<uint32_t>(LogicalTypeId::POINTER), "POINTER" },
26002626
{ static_cast<uint32_t>(LogicalTypeId::VALIDITY), "VALIDITY" },
26012627
{ static_cast<uint32_t>(LogicalTypeId::UUID), "UUID" },
2628+
{ static_cast<uint32_t>(LogicalTypeId::GEOMETRY), "GEOMETRY" },
26022629
{ static_cast<uint32_t>(LogicalTypeId::STRUCT), "STRUCT" },
26032630
{ static_cast<uint32_t>(LogicalTypeId::LIST), "LIST" },
26042631
{ static_cast<uint32_t>(LogicalTypeId::MAP), "MAP" },
@@ -2615,12 +2642,12 @@ const StringUtil::EnumStringLiteral *GetLogicalTypeIdValues() {
26152642

26162643
template<>
26172644
const char* EnumUtil::ToChars<LogicalTypeId>(LogicalTypeId value) {
2618-
return StringUtil::EnumToString(GetLogicalTypeIdValues(), 50, "LogicalTypeId", static_cast<uint32_t>(value));
2645+
return StringUtil::EnumToString(GetLogicalTypeIdValues(), 51, "LogicalTypeId", static_cast<uint32_t>(value));
26192646
}
26202647

26212648
template<>
26222649
LogicalTypeId EnumUtil::FromString<LogicalTypeId>(const char *value) {
2623-
return static_cast<LogicalTypeId>(StringUtil::StringToEnum(GetLogicalTypeIdValues(), 50, "LogicalTypeId", value));
2650+
return static_cast<LogicalTypeId>(StringUtil::StringToEnum(GetLogicalTypeIdValues(), 51, "LogicalTypeId", value));
26242651
}
26252652

26262653
const StringUtil::EnumStringLiteral *GetLookupResultTypeValues() {
@@ -4808,19 +4835,20 @@ const StringUtil::EnumStringLiteral *GetVariantLogicalTypeValues() {
48084835
{ static_cast<uint32_t>(VariantLogicalType::ARRAY), "ARRAY" },
48094836
{ static_cast<uint32_t>(VariantLogicalType::BIGNUM), "BIGNUM" },
48104837
{ static_cast<uint32_t>(VariantLogicalType::BITSTRING), "BITSTRING" },
4838+
{ static_cast<uint32_t>(VariantLogicalType::GEOMETRY), "GEOMETRY" },
48114839
{ static_cast<uint32_t>(VariantLogicalType::ENUM_SIZE), "ENUM_SIZE" }
48124840
};
48134841
return values;
48144842
}
48154843

48164844
template<>
48174845
const char* EnumUtil::ToChars<VariantLogicalType>(VariantLogicalType value) {
4818-
return StringUtil::EnumToString(GetVariantLogicalTypeValues(), 34, "VariantLogicalType", static_cast<uint32_t>(value));
4846+
return StringUtil::EnumToString(GetVariantLogicalTypeValues(), 35, "VariantLogicalType", static_cast<uint32_t>(value));
48194847
}
48204848

48214849
template<>
48224850
VariantLogicalType EnumUtil::FromString<VariantLogicalType>(const char *value) {
4823-
return static_cast<VariantLogicalType>(StringUtil::StringToEnum(GetVariantLogicalTypeValues(), 34, "VariantLogicalType", value));
4851+
return static_cast<VariantLogicalType>(StringUtil::StringToEnum(GetVariantLogicalTypeValues(), 35, "VariantLogicalType", value));
48244852
}
48254853

48264854
const StringUtil::EnumStringLiteral *GetVectorAuxiliaryDataTypeValues() {

src/common/extra_type_info.cpp

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -507,4 +507,19 @@ shared_ptr<ExtraTypeInfo> TemplateTypeInfo::Copy() const {
507507
return make_shared_ptr<TemplateTypeInfo>(*this);
508508
}
509509

510+
//===--------------------------------------------------------------------===//
511+
// Geo Type Info
512+
//===--------------------------------------------------------------------===//
513+
GeoTypeInfo::GeoTypeInfo() : ExtraTypeInfo(ExtraTypeInfoType::GEO_TYPE_INFO) {
514+
}
515+
516+
bool GeoTypeInfo::EqualsInternal(ExtraTypeInfo *other_p) const {
517+
// No additional info to compare
518+
return true;
519+
}
520+
521+
shared_ptr<ExtraTypeInfo> GeoTypeInfo::Copy() const {
522+
return make_shared_ptr<GeoTypeInfo>(*this);
523+
}
524+
510525
} // namespace duckdb

src/common/operator/cast_operators.cpp

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@
1919
#include "duckdb/common/types/time.hpp"
2020
#include "duckdb/common/types/timestamp.hpp"
2121
#include "duckdb/common/types/vector.hpp"
22+
#include "duckdb/common/types/geometry.hpp"
2223
#include "duckdb/common/types.hpp"
2324
#include "fast_float/fast_float.h"
2425
#include "duckdb/common/types/bit.hpp"
@@ -1560,6 +1561,14 @@ bool TryCastBlobToUUID::Operation(string_t input, hugeint_t &result, bool strict
15601561
return true;
15611562
}
15621563

1564+
//===--------------------------------------------------------------------===//
1565+
// Cast To Geometry
1566+
//===--------------------------------------------------------------------===//
1567+
template <>
1568+
bool TryCastToGeometry::Operation(string_t input, string_t &result, Vector &result_vector, CastParameters &parameters) {
1569+
return Geometry::FromString(input, result, result_vector, parameters.strict);
1570+
}
1571+
15631572
//===--------------------------------------------------------------------===//
15641573
// Cast To Date
15651574
//===--------------------------------------------------------------------===//

src/common/types.cpp

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -159,6 +159,8 @@ PhysicalType LogicalType::GetInternalType() {
159159
return PhysicalType::UNKNOWN;
160160
case LogicalTypeId::AGGREGATE_STATE:
161161
return PhysicalType::VARCHAR;
162+
case LogicalTypeId::GEOMETRY:
163+
return PhysicalType::VARCHAR;
162164
default:
163165
throw InternalException("Invalid LogicalType %s", ToString());
164166
}
@@ -1344,6 +1346,8 @@ static idx_t GetLogicalTypeScore(const LogicalType &type) {
13441346
return 102;
13451347
case LogicalTypeId::BIGNUM:
13461348
return 103;
1349+
case LogicalTypeId::GEOMETRY:
1350+
return 104;
13471351
// nested types
13481352
case LogicalTypeId::STRUCT:
13491353
return 125;
@@ -2014,6 +2018,15 @@ LogicalType LogicalType::VARIANT() {
20142018
return LogicalType(LogicalTypeId::VARIANT, std::move(info));
20152019
}
20162020

2021+
//===--------------------------------------------------------------------===//
2022+
// Spatial Types
2023+
//===--------------------------------------------------------------------===//
2024+
2025+
LogicalType LogicalType::GEOMETRY() {
2026+
auto info = make_shared_ptr<GeoTypeInfo>();
2027+
return LogicalType(LogicalTypeId::GEOMETRY, std::move(info));
2028+
}
2029+
20172030
//===--------------------------------------------------------------------===//
20182031
// Logical Type
20192032
//===--------------------------------------------------------------------===//

src/common/types/CMakeLists.txt

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,7 +36,8 @@ add_library_unity(
3636
vector_buffer.cpp
3737
vector.cpp
3838
vector_cache.cpp
39-
vector_constants.cpp)
39+
vector_constants.cpp
40+
geometry.cpp)
4041
set(ALL_OBJECT_FILES
4142
${ALL_OBJECT_FILES} $<TARGET_OBJECTS:duckdb_common_types>
4243
PARENT_SCOPE)

0 commit comments

Comments
 (0)