You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/delta/WKB.html
+32-41Lines changed: 32 additions & 41 deletions
Original file line number
Diff line number
Diff line change
@@ -387,19 +387,23 @@ <h1 class="title"><span class="chapter-title">Storing spatial data in Delta Lake
387
387
388
388
389
389
<p>You can store geometry or geography data in a Delta Lake table in a <ahref="https://docs.databricks.com/aws/en/sql/language-manual/data-types/binary-type">BINARY</a> column as <ahref="https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary">Well-known binary</a> (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl. in the Geoparquet specification. On the other hand, unlike the newer <ahref="https://docs.databricks.com/aws/en/sql/language-manual/data-types/geometry-type">GEOMETRY</a> and GEOGRAPHY types, there is no higher level semantic support possible. Also, you need to use the conversion function <ahref="https://docs.databricks.com/gcp/en/sql/language-manual/functions/st_geomfromwkb">st_geomfromwkb</a> or <ahref="https://docs.databricks.com/gcp/en/sql/language-manual/functions/st_geomfromewkb">st_geomfromwekb</a> before any other ST function.</p>
<spanid="cb2-6"><ahref="#cb2-6"></a> t_ewkb</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb2-10"><ahref="#cb2-10"></a><spanclass="op">--</span> POINT(<spanclass="dv">3</span><spanclass="dv">0</span>) <spanclass="dv">4326</span> AQEAACDmEAAAAAAAAAAACEAAAAAAAAAAAA<spanclass="op">==</span></span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
403
407
</div>
404
408
<p>Another example of Delta Lake tables with WKB columns are the <ahref="https://marketplace.databricks.com/provider/dd56dcf4-cb70-449e-abad-c8038c0de3d9/CARTO">Overture Maps datasets</a> prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use <ahref="https://dbc-63c92876-2d84.cloud.databricks.com/marketplace/consumer/listings/2b2d3511-55cf-493c-8224-f5c5103a8d74?o=3737817604111714">Divisions</a> (borders of countries and other administrative divisions):</p>
@@ -415,7 +419,7 @@ <h1 class="title"><span class="chapter-title">Storing spatial data in Delta Lake
415
419
<p>The CARTO/Overture Maps tables are stored in <code>us-west-2</code> as of writing, so if you are <em>not</em> using Databricks Free Edition and you are in any other region, you will have to pay egress charges based on the amount of data you read.</p>
@@ -435,42 +439,29 @@ <h1 class="title"><span class="chapter-title">Storing spatial data in Delta Lake
435
439
<p>Another pattern would be to make use of spatial indexing such as <ahref="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-h3-geospatial-functions">H3</a>.</p>
<spanid="cb4-33"><ahref="#cb4-33"></a> fr</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
444
+
<spanid="cb4-2"><ahref="#cb4-2"></a><spanclass="cf">with</span> countries <spanclass="im">as</span> (</span>
<spanid="cb4-11"><ahref="#cb4-11"></a><spanclass="kw">and</span> country <spanclass="kw">in</span> (<spanclass="st">'GB'</span>, <spanclass="st">'FR'</span>)</span>
<spanid="cb4-21"><ahref="#cb4-21"></a><spanclass="op">--</span> GB <spanclass="fl">244408.1099778328</span></span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
472
464
</div>
473
-
<p>Now the question is, if the <ahref="https://en.wikipedia.org/wiki/Strait_of_Dover">Strait of Dover</a> is about 32 kms narrow, why did we get more above? For this, we’d need to find out where exactly the shortest line is between the two countries. There is no st_shortestline yet in Databricks ST functions as of 2025-08-03, but we can use a Spark UDF with DuckDB spatial to fill this gap, see here TODO: .</p>
Copy file name to clipboardExpand all lines: docs/search.json
+2-2Lines changed: 2 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -113,7 +113,7 @@
113
113
"href": "delta/WKB.html",
114
114
"title": "Storing spatial data in Delta Lake as WKB",
115
115
"section": "",
116
-
"text": "Example usage with ST functions\nYou can store geometry or geography data in a Delta Lake table in a BINARY column as Well-known binary (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl. in the Geoparquet specification. On the other hand, unlike the newer GEOMETRY and GEOGRAPHY types, there is no higher level semantic support possible. Also, you need to use the conversion function st_geomfromwkb or st_geomfromwekb before any other ST function.\nAnother example of Delta Lake tables with WKB columns are the Overture Maps datasets prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use Divisions (borders of countries and other administrative divisions):\nThese CARTO tables also show one pattern to organize and cluster tables with geometries: they include the bounding box columns __carto_xmin, __carto_xmax, __carto_ymin, ___carto_ymax and are clustered by these colums.\nAnother pattern would be to make use of spatial indexing such as H3.\n%sql\n-- The distance between the UK and France\nwith countries as (\n select\n country,\n st_geomfromwkb(geometry) wkb_geometry\n from\n carto_overture_maps_divisions.carto.division_area\n where\n subtype = 'country'\n and class = 'land'\n),\nuk as (\n select\n wkb_geometry\n from\n countries\n where\n country = 'GB'\n),\nfr as (\n select\n wkb_geometry\n from\n countries\n where\n country = 'FR'\n)\nselect\n st_distancespheroid(uk.wkb_geometry, fr.wkb_geometry) / 1e3 distance_km\nfrom\n uk,\n fr\nNow the question is, if the Strait of Dover is about 32 kms narrow, why did we get more above? For this, we’d need to find out where exactly the shortest line is between the two countries. There is no st_shortestline yet in Databricks ST functions as of 2025-08-03, but we can use a Spark UDF with DuckDB spatial to fill this gap, see here TODO: .",
116
+
"text": "Example usage with ST functions\nYou can store geometry or geography data in a Delta Lake table in a BINARY column as Well-known binary (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl. in the Geoparquet specification. On the other hand, unlike the newer GEOMETRY and GEOGRAPHY types, there is no higher level semantic support possible. Also, you need to use the conversion function st_geomfromwkb or st_geomfromwekb before any other ST function.\nAnother example of Delta Lake tables with WKB columns are the Overture Maps datasets prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use Divisions (borders of countries and other administrative divisions):\nThese CARTO tables also show one pattern to organize and cluster tables with geometries: they include the bounding box columns __carto_xmin, __carto_xmax, __carto_ymin, ___carto_ymax and are clustered by these colums.\nAnother pattern would be to make use of spatial indexing such as H3.\n%sql\nwith countries as (\n select\n country,\n st_geogfromwkb(geometry) geography\n from\n carto_overture_maps_divisions.carto.division_area\n where\n subtype = 'country'\n and class = 'land'\n and country in ('GB', 'FR')\n)\nselect\n country,\n st_area(geography) / 1e6 area_km2s\nfrom\n countries\n-- Returns:\n-- country area_km2s\n-- FR 549231.6644010496\n-- GB 244408.1099778328",
117
117
"crumbs": [
118
118
"Geospatial Delta Lake",
119
119
"<span class='chapter-number'>6</span> <span class='chapter-title'>Storing spatial data in Delta Lake as WKB</span>"
@@ -432,7 +432,7 @@
432
432
"href": "stfunctions/duckdb_udf.html",
433
433
"title": "Define a Spark UDF using a DuckDB Spatial function",
434
434
"section": "",
435
-
"text": "Setup\nDatabricks SQL now contains lots of ST Functions. However, at some point you might just need a geospatial function not (yet) available natively in Databricks, but maybe available in DuckDB Spatial. For example, as of Aug 2025, the st_shortestline function.\nThen we can register the DuckDB function as a Spark UDF as follows:\n%pip install duckdb lonboard shapely --quiet\n\nimport duckdb\nimport pandas as pd\nfrom pyspark.sql import functions as F\nfrom pyspark.sql.types import BinaryType",
435
+
"text": "Setup\n%md\nDatabricks SQL now contains lots of ST Functions. However, at some point you might just need a geospatial function not (yet) available natively in Databricks, but maybe available in DuckDB Spatial. For example, as of Aug 2025, the st_shortestline function.\nThen we can register the DuckDB function as a Spark UDF as follows:\n%pip install duckdb lonboard shapely --quiet\n\nimport duckdb\nimport pandas as pd\nfrom pyspark.sql import functions as F\nfrom pyspark.sql.types import BinaryType",
436
436
"crumbs": [
437
437
"Spatial functions",
438
438
"<span class='chapter-number'>15</span> <span class='chapter-title'>Define a Spark UDF using a DuckDB Spatial function</span>"
0 commit comments