Skip to content

Commit 21fdabd

Browse files
committed
qr
1 parent a38ea52 commit 21fdabd

File tree

3 files changed

+41
-49
lines changed

3 files changed

+41
-49
lines changed

docs/delta/WKB.html

Lines changed: 32 additions & 41 deletions
Original file line numberDiff line numberDiff line change
@@ -387,19 +387,23 @@ <h1 class="title"><span class="chapter-title">Storing spatial data in Delta Lake
387387

388388

389389
<p>You can store geometry or geography data in a Delta Lake table in a <a href="https://docs.databricks.com/aws/en/sql/language-manual/data-types/binary-type">BINARY</a> column as <a href="https://en.wikipedia.org/wiki/Well-known_text_representation_of_geometry#Well-known_binary">Well-known binary</a> (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl.&nbsp;in the Geoparquet specification. On the other hand, unlike the newer <a href="https://docs.databricks.com/aws/en/sql/language-manual/data-types/geometry-type">GEOMETRY</a> and GEOGRAPHY types, there is no higher level semantic support possible. Also, you need to use the conversion function <a href="https://docs.databricks.com/gcp/en/sql/language-manual/functions/st_geomfromwkb">st_geomfromwkb</a> or <a href="https://docs.databricks.com/gcp/en/sql/language-manual/functions/st_geomfromewkb">st_geomfromwekb</a> before any other ST function.</p>
390-
<div id="cell-1" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;b7ff4077-cee0-437d-8a81-16473b13f6c1&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}">
390+
<div id="cell-1" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;b7ff4077-cee0-437d-8a81-16473b13f6c1&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}" data-execution_count="0">
391391
<div class="sourceCode cell-code" id="cb1"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb1-1"><a href="#cb1-1"></a><span class="op">%</span>sql</span>
392392
<span id="cb1-2"><a href="#cb1-2"></a>create temporary view t_ewkb <span class="im">as</span></span>
393393
<span id="cb1-3"><a href="#cb1-3"></a>select</span>
394394
<span id="cb1-4"><a href="#cb1-4"></a> st_asewkb(st_point(<span class="dv">3</span>, <span class="dv">0</span>, <span class="dv">4326</span>)) wkb_geometry</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
395395
</div>
396-
<div id="cell-2" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;c33be0d2-99fd-4eb1-9bc6-5b75aa358a52&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}">
396+
<div id="cell-2" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;c33be0d2-99fd-4eb1-9bc6-5b75aa358a52&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}" data-execution_count="0">
397397
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a><span class="op">%</span>sql</span>
398398
<span id="cb2-2"><a href="#cb2-2"></a>select</span>
399-
<span id="cb2-3"><a href="#cb2-3"></a> st_geomfromewkb(wkb_geometry),</span>
400-
<span id="cb2-4"><a href="#cb2-4"></a> wkb_geometry</span>
401-
<span id="cb2-5"><a href="#cb2-5"></a><span class="im">from</span></span>
402-
<span id="cb2-6"><a href="#cb2-6"></a> t_ewkb</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
399+
<span id="cb2-3"><a href="#cb2-3"></a> st_astext(st_geomfromewkb(wkb_geometry)) wkt,</span>
400+
<span id="cb2-4"><a href="#cb2-4"></a> st_srid(st_geomfromewkb(wkb_geometry)) srid,</span>
401+
<span id="cb2-5"><a href="#cb2-5"></a> wkb_geometry</span>
402+
<span id="cb2-6"><a href="#cb2-6"></a><span class="im">from</span></span>
403+
<span id="cb2-7"><a href="#cb2-7"></a> t_ewkb</span>
404+
<span id="cb2-8"><a href="#cb2-8"></a><span class="op">--</span> Returns:</span>
405+
<span id="cb2-9"><a href="#cb2-9"></a><span class="op">--</span> wkt srid wkb_geometry</span>
406+
<span id="cb2-10"><a href="#cb2-10"></a><span class="op">--</span> POINT(<span class="dv">3</span> <span class="dv">0</span>) <span class="dv">4326</span> AQEAACDmEAAAAAAAAAAACEAAAAAAAAAAAA<span class="op">==</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
403407
</div>
404408
<p>Another example of Delta Lake tables with WKB columns are the <a href="https://marketplace.databricks.com/provider/dd56dcf4-cb70-449e-abad-c8038c0de3d9/CARTO">Overture Maps datasets</a> prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use <a href="https://dbc-63c92876-2d84.cloud.databricks.com/marketplace/consumer/listings/2b2d3511-55cf-493c-8224-f5c5103a8d74?o=3737817604111714">Divisions</a> (borders of countries and other administrative divisions):</p>
405409
<div class="callout callout-style-default callout-caution callout-titled">
@@ -415,7 +419,7 @@ <h1 class="title"><span class="chapter-title">Storing spatial data in Delta Lake
415419
<p>The CARTO/Overture Maps tables are stored in <code>us-west-2</code> as of writing, so if you are <em>not</em> using Databricks Free Edition and you are in any other region, you will have to pay egress charges based on the amount of data you read.</p>
416420
</div>
417421
</div>
418-
<div id="cell-4" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;4abc1430-5118-49dc-9bb0-3c48c05d3b1f&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}">
422+
<div id="cell-4" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;4abc1430-5118-49dc-9bb0-3c48c05d3b1f&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}" data-execution_count="0">
419423
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1"></a><span class="op">%</span>sql</span>
420424
<span id="cb3-2"><a href="#cb3-2"></a>select</span>
421425
<span id="cb3-3"><a href="#cb3-3"></a> names:primary <span class="im">as</span> name,</span>
@@ -435,42 +439,29 @@ <h1 class="title"><span class="chapter-title">Storing spatial data in Delta Lake
435439
<p>Another pattern would be to make use of spatial indexing such as <a href="https://docs.databricks.com/aws/en/sql/language-manual/sql-ref-h3-geospatial-functions">H3</a>.</p>
436440
<section id="example-usage-with-st-functions" class="level2">
437441
<h2 class="anchored" data-anchor-id="example-usage-with-st-functions">Example usage with ST functions</h2>
438-
<div id="cell-7" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;d23d8da8-5ff6-49e4-9914-85fdb3e3951b&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{&quot;0&quot;:{&quot;dataGridStateBlob&quot;:&quot;{&quot;version&quot;:1,&quot;tableState&quot;:{&quot;columnPinning&quot;:{&quot;left&quot;:[&quot;#row_number#&quot;],&quot;right&quot;:[]},&quot;columnSizing&quot;:{},&quot;columnVisibility&quot;:{}},&quot;settings&quot;:{&quot;columns&quot;:{}},&quot;syncTimestamp&quot;:1754201398911}&quot;,&quot;filterBlob&quot;:null,&quot;queryPlanFiltersBlob&quot;:null,&quot;tableResultIndex&quot;:0}},&quot;title&quot;:&quot;&quot;}}">
442+
<div id="cell-7" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;d23d8da8-5ff6-49e4-9914-85fdb3e3951b&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{&quot;0&quot;:{&quot;dataGridStateBlob&quot;:&quot;{&quot;version&quot;:1,&quot;tableState&quot;:{&quot;columnPinning&quot;:{&quot;left&quot;:[&quot;#row_number#&quot;],&quot;right&quot;:[]},&quot;columnSizing&quot;:{},&quot;columnVisibility&quot;:{}},&quot;settings&quot;:{&quot;columns&quot;:{}},&quot;syncTimestamp&quot;:1754201398911}&quot;,&quot;filterBlob&quot;:null,&quot;queryPlanFiltersBlob&quot;:null,&quot;tableResultIndex&quot;:0}},&quot;title&quot;:&quot;&quot;}}" data-execution_count="0">
439443
<div class="sourceCode cell-code" id="cb4"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb4-1"><a href="#cb4-1"></a><span class="op">%</span>sql</span>
440-
<span id="cb4-2"><a href="#cb4-2"></a><span class="op">--</span> The distance between the UK <span class="kw">and</span> France</span>
441-
<span id="cb4-3"><a href="#cb4-3"></a><span class="cf">with</span> countries <span class="im">as</span> (</span>
442-
<span id="cb4-4"><a href="#cb4-4"></a> select</span>
443-
<span id="cb4-5"><a href="#cb4-5"></a> country,</span>
444-
<span id="cb4-6"><a href="#cb4-6"></a> st_geomfromwkb(geometry) wkb_geometry</span>
445-
<span id="cb4-7"><a href="#cb4-7"></a> <span class="im">from</span></span>
446-
<span id="cb4-8"><a href="#cb4-8"></a> carto_overture_maps_divisions.carto.division_area</span>
447-
<span id="cb4-9"><a href="#cb4-9"></a> where</span>
448-
<span id="cb4-10"><a href="#cb4-10"></a> subtype <span class="op">=</span> <span class="st">'country'</span></span>
449-
<span id="cb4-11"><a href="#cb4-11"></a> <span class="kw">and</span> <span class="kw">class</span> <span class="op">=</span> <span class="st">'land'</span></span>
450-
<span id="cb4-12"><a href="#cb4-12"></a>),</span>
451-
<span id="cb4-13"><a href="#cb4-13"></a>uk <span class="im">as</span> (</span>
452-
<span id="cb4-14"><a href="#cb4-14"></a> select</span>
453-
<span id="cb4-15"><a href="#cb4-15"></a> wkb_geometry</span>
454-
<span id="cb4-16"><a href="#cb4-16"></a> <span class="im">from</span></span>
455-
<span id="cb4-17"><a href="#cb4-17"></a> countries</span>
456-
<span id="cb4-18"><a href="#cb4-18"></a> where</span>
457-
<span id="cb4-19"><a href="#cb4-19"></a> country <span class="op">=</span> <span class="st">'GB'</span></span>
458-
<span id="cb4-20"><a href="#cb4-20"></a>),</span>
459-
<span id="cb4-21"><a href="#cb4-21"></a>fr <span class="im">as</span> (</span>
460-
<span id="cb4-22"><a href="#cb4-22"></a> select</span>
461-
<span id="cb4-23"><a href="#cb4-23"></a> wkb_geometry</span>
462-
<span id="cb4-24"><a href="#cb4-24"></a> <span class="im">from</span></span>
463-
<span id="cb4-25"><a href="#cb4-25"></a> countries</span>
464-
<span id="cb4-26"><a href="#cb4-26"></a> where</span>
465-
<span id="cb4-27"><a href="#cb4-27"></a> country <span class="op">=</span> <span class="st">'FR'</span></span>
466-
<span id="cb4-28"><a href="#cb4-28"></a>)</span>
467-
<span id="cb4-29"><a href="#cb4-29"></a>select</span>
468-
<span id="cb4-30"><a href="#cb4-30"></a> st_distancespheroid(uk.wkb_geometry, fr.wkb_geometry) <span class="op">/</span> <span class="fl">1e3</span> distance_km</span>
469-
<span id="cb4-31"><a href="#cb4-31"></a><span class="im">from</span></span>
470-
<span id="cb4-32"><a href="#cb4-32"></a> uk,</span>
471-
<span id="cb4-33"><a href="#cb4-33"></a> fr</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
444+
<span id="cb4-2"><a href="#cb4-2"></a><span class="cf">with</span> countries <span class="im">as</span> (</span>
445+
<span id="cb4-3"><a href="#cb4-3"></a> select</span>
446+
<span id="cb4-4"><a href="#cb4-4"></a> country,</span>
447+
<span id="cb4-5"><a href="#cb4-5"></a> st_geogfromwkb(geometry) geography</span>
448+
<span id="cb4-6"><a href="#cb4-6"></a> <span class="im">from</span></span>
449+
<span id="cb4-7"><a href="#cb4-7"></a> carto_overture_maps_divisions.carto.division_area</span>
450+
<span id="cb4-8"><a href="#cb4-8"></a> where</span>
451+
<span id="cb4-9"><a href="#cb4-9"></a> subtype <span class="op">=</span> <span class="st">'country'</span></span>
452+
<span id="cb4-10"><a href="#cb4-10"></a> <span class="kw">and</span> <span class="kw">class</span> <span class="op">=</span> <span class="st">'land'</span></span>
453+
<span id="cb4-11"><a href="#cb4-11"></a> <span class="kw">and</span> country <span class="kw">in</span> (<span class="st">'GB'</span>, <span class="st">'FR'</span>)</span>
454+
<span id="cb4-12"><a href="#cb4-12"></a>)</span>
455+
<span id="cb4-13"><a href="#cb4-13"></a>select</span>
456+
<span id="cb4-14"><a href="#cb4-14"></a> country,</span>
457+
<span id="cb4-15"><a href="#cb4-15"></a> st_area(geography) <span class="op">/</span> <span class="fl">1e6</span> area_km2s</span>
458+
<span id="cb4-16"><a href="#cb4-16"></a><span class="im">from</span></span>
459+
<span id="cb4-17"><a href="#cb4-17"></a> countries</span>
460+
<span id="cb4-18"><a href="#cb4-18"></a><span class="op">--</span> Returns:</span>
461+
<span id="cb4-19"><a href="#cb4-19"></a><span class="op">--</span> country area_km2s</span>
462+
<span id="cb4-20"><a href="#cb4-20"></a><span class="op">--</span> FR <span class="fl">549231.6644010496</span></span>
463+
<span id="cb4-21"><a href="#cb4-21"></a><span class="op">--</span> GB <span class="fl">244408.1099778328</span></span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
472464
</div>
473-
<p>Now the question is, if the <a href="https://en.wikipedia.org/wiki/Strait_of_Dover">Strait of Dover</a> is about 32 kms narrow, why did we get more above? For this, we’d need to find out where exactly the shortest line is between the two countries. There is no st_shortestline yet in Databricks ST functions as of 2025-08-03, but we can use a Spark UDF with DuckDB spatial to fill this gap, see here TODO: .</p>
474465

475466

476467
</section>

docs/search.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -113,7 +113,7 @@
113113
"href": "delta/WKB.html",
114114
"title": "Storing spatial data in Delta Lake as WKB",
115115
"section": "",
116-
"text": "Example usage with ST functions\nYou can store geometry or geography data in a Delta Lake table in a BINARY column as Well-known binary (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl. in the Geoparquet specification. On the other hand, unlike the newer GEOMETRY and GEOGRAPHY types, there is no higher level semantic support possible. Also, you need to use the conversion function st_geomfromwkb or st_geomfromwekb before any other ST function.\nAnother example of Delta Lake tables with WKB columns are the Overture Maps datasets prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use Divisions (borders of countries and other administrative divisions):\nThese CARTO tables also show one pattern to organize and cluster tables with geometries: they include the bounding box columns __carto_xmin, __carto_xmax, __carto_ymin, ___carto_ymax and are clustered by these colums.\nAnother pattern would be to make use of spatial indexing such as H3.\n%sql\n-- The distance between the UK and France\nwith countries as (\n select\n country,\n st_geomfromwkb(geometry) wkb_geometry\n from\n carto_overture_maps_divisions.carto.division_area\n where\n subtype = 'country'\n and class = 'land'\n),\nuk as (\n select\n wkb_geometry\n from\n countries\n where\n country = 'GB'\n),\nfr as (\n select\n wkb_geometry\n from\n countries\n where\n country = 'FR'\n)\nselect\n st_distancespheroid(uk.wkb_geometry, fr.wkb_geometry) / 1e3 distance_km\nfrom\n uk,\n fr\nNow the question is, if the Strait of Dover is about 32 kms narrow, why did we get more above? For this, we’d need to find out where exactly the shortest line is between the two countries. There is no st_shortestline yet in Databricks ST functions as of 2025-08-03, but we can use a Spark UDF with DuckDB spatial to fill this gap, see here TODO: .",
116+
"text": "Example usage with ST functions\nYou can store geometry or geography data in a Delta Lake table in a BINARY column as Well-known binary (WKB or EWKB). This is a more compact representation than well-known text (WKT), and widely supported incl. in the Geoparquet specification. On the other hand, unlike the newer GEOMETRY and GEOGRAPHY types, there is no higher level semantic support possible. Also, you need to use the conversion function st_geomfromwkb or st_geomfromwekb before any other ST function.\nAnother example of Delta Lake tables with WKB columns are the Overture Maps datasets prepared by CARTO, available via the Databricks Marketplace. Follow the previous link to add any them (at no cost) to your catalog, if you haven’t yet. For example, for the below query, use Divisions (borders of countries and other administrative divisions):\nThese CARTO tables also show one pattern to organize and cluster tables with geometries: they include the bounding box columns __carto_xmin, __carto_xmax, __carto_ymin, ___carto_ymax and are clustered by these colums.\nAnother pattern would be to make use of spatial indexing such as H3.\n%sql\nwith countries as (\n select\n country,\n st_geogfromwkb(geometry) geography\n from\n carto_overture_maps_divisions.carto.division_area\n where\n subtype = 'country'\n and class = 'land'\n and country in ('GB', 'FR')\n)\nselect\n country,\n st_area(geography) / 1e6 area_km2s\nfrom\n countries\n-- Returns:\n-- country area_km2s\n-- FR 549231.6644010496\n-- GB 244408.1099778328",
117117
"crumbs": [
118118
"Geospatial Delta Lake",
119119
"<span class='chapter-number'>6</span>  <span class='chapter-title'>Storing spatial data in Delta Lake as WKB</span>"
@@ -432,7 +432,7 @@
432432
"href": "stfunctions/duckdb_udf.html",
433433
"title": "Define a Spark UDF using a DuckDB Spatial function",
434434
"section": "",
435-
"text": "Setup\nDatabricks SQL now contains lots of ST Functions. However, at some point you might just need a geospatial function not (yet) available natively in Databricks, but maybe available in DuckDB Spatial. For example, as of Aug 2025, the st_shortestline function.\nThen we can register the DuckDB function as a Spark UDF as follows:\n%pip install duckdb lonboard shapely --quiet\n\nimport duckdb\nimport pandas as pd\nfrom pyspark.sql import functions as F\nfrom pyspark.sql.types import BinaryType",
435+
"text": "Setup\n%md\nDatabricks SQL now contains lots of ST Functions. However, at some point you might just need a geospatial function not (yet) available natively in Databricks, but maybe available in DuckDB Spatial. For example, as of Aug 2025, the st_shortestline function.\nThen we can register the DuckDB function as a Spark UDF as follows:\n%pip install duckdb lonboard shapely --quiet\n\nimport duckdb\nimport pandas as pd\nfrom pyspark.sql import functions as F\nfrom pyspark.sql.types import BinaryType",
436436
"crumbs": [
437437
"Spatial functions",
438438
"<span class='chapter-number'>15</span>  <span class='chapter-title'>Define a Spark UDF using a DuckDB Spatial function</span>"

0 commit comments

Comments
 (0)