Skip to content

Commit 36c97b3

Browse files
committed
qr
1 parent a01134f commit 36c97b3

File tree

2 files changed

+23
-21
lines changed

2 files changed

+23
-21
lines changed

docs/search.json

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -300,7 +300,7 @@
300300
"href": "viz/maplibregljs.html",
301301
"title": "Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS",
302302
"section": "",
303-
"text": "Create a sample table\nWe have MVT support in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson here in an example Flask app.\n(Another nice tool to consume DuckDB MVT’s would be Martin, this is tracked in this issue.)\nBut how do we efficiently generate MVT’s from a Delta Lake table containing a GEOMETRY, given the tile indices?\nThe key thing to consider is that now Databricks has very efficient spatial join filtering via e.g. ST_Intersect, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):\nWe create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of Databricks Free Edition as I did.\nOVERTUREMAPS_RELEASE = '2025-10-22.0'\nCOUNTRY_CODE = 'NL'\n\ncountry_bbox = spark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area\"\n).where(f\"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'\").select(\"bbox.*\").toPandas().iloc[0]\nfrom pyspark.sql import functions as F\n\nspark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building\"\n).where(\n f\"\"\"bbox.xmin < {country_bbox['xmax']}\n and bbox.xmax > {country_bbox['xmin']}\n and bbox.ymin < {country_bbox['ymax']}\n and bbox.ymax > {country_bbox['ymin']}\n \"\"\"\n).withColumn(\n \"geometry\", F.expr(\"st_geomfromwkb(geometry)\")\n).write.mode(\n \"overwrite\"\n).saveAsTable(\n f\"workspace.default.building_geom\"\n)\nNow we can build on Maxxen’s gist, with the following adjustments:\nIn the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work – note how 1) the graceful feature limit means that “too busy” tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.\nFind the full code below the video, which you can run locally as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).\n(click on the above image to play the video.)\n```python # Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson\nimport os\nimport duckdb import flask\nfrom databricks import sql # type: ignore\nMAX_FEATURES_PER_TILE = 30_000",
303+
"text": "Create a sample table\nWe have MVT support in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson here in an example Flask app.\n(Another nice tool to consume DuckDB MVT’s would be Martin, this is tracked in this issue.)\nBut how do we efficiently generate MVT’s from a Delta Lake table containing a GEOMETRY, given the tile indices?\nThe key thing to consider is that now Databricks has very efficient spatial join filtering via e.g. ST_Intersect, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):\nWe create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of Databricks Free Edition as I did.\nOVERTUREMAPS_RELEASE = \"2025-10-22.0\"\nCOUNTRY_CODE = \"NL\"\n\ncountry_bbox = (\n spark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area\"\n )\n .where(f\"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'\")\n .select(\"bbox.*\")\n .toPandas()\n .iloc[0]\n)\nfrom pyspark.sql import functions as F\n\nspark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building\"\n).where(\n f\"\"\"bbox.xmin < {country_bbox[\"xmax\"]}\n and bbox.xmax > {country_bbox[\"xmin\"]}\n and bbox.ymin < {country_bbox[\"ymax\"]}\n and bbox.ymax > {country_bbox[\"ymin\"]}\n \"\"\"\n).withColumn(\"geometry\", F.expr(\"st_geomfromwkb(geometry)\")).write.mode(\n \"overwrite\"\n).saveAsTable(\"workspace.default.building_geom\")\nNow we can build on Maxxen’s gist, with the following adjustments:\nIn the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work – note how 1) the graceful feature limit means that “too busy” tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.\nFind the full code below the video, which you can run locally as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).\n(click on the above image to play the video.)\n```python # Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson\nimport os\nimport duckdb import flask\nfrom databricks import sql # type: ignore\nMAX_FEATURES_PER_TILE = 30_000",
304304
"crumbs": [
305305
"Visualization",
306306
"<span class='chapter-number'>10</span>  <span class='chapter-title'>Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS</span>"

docs/viz/maplibregljs.html

Lines changed: 22 additions & 20 deletions
Original file line numberDiff line numberDiff line change
@@ -433,38 +433,40 @@ <h1 class="title"><span class="chapter-title">Scalable visualization of large De
433433
<section id="create-a-sample-table" class="level2">
434434
<h2 class="anchored" data-anchor-id="create-a-sample-table">Create a sample table</h2>
435435
<p>We create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of <em>Databricks Free Edition</em> as I did.</p>
436-
<div id="cell-3" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;f176cd45-e97f-4312-acb8-7d0b0e506c16&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}" data-execution_count="0">
437-
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a>OVERTUREMAPS_RELEASE <span class="op">=</span> <span class="st">'2025-10-22.0'</span></span>
438-
<span id="cb2-2"><a href="#cb2-2"></a>COUNTRY_CODE <span class="op">=</span> <span class="st">'NL'</span></span>
436+
<div id="cell-3" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;f176cd45-e97f-4312-acb8-7d0b0e506c16&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}">
437+
<div class="sourceCode cell-code" id="cb2"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb2-1"><a href="#cb2-1"></a>OVERTUREMAPS_RELEASE <span class="op">=</span> <span class="st">"2025-10-22.0"</span></span>
438+
<span id="cb2-2"><a href="#cb2-2"></a>COUNTRY_CODE <span class="op">=</span> <span class="st">"NL"</span></span>
439439
<span id="cb2-3"><a href="#cb2-3"></a></span>
440-
<span id="cb2-4"><a href="#cb2-4"></a>country_bbox <span class="op">=</span> spark.read.parquet(</span>
441-
<span id="cb2-5"><a href="#cb2-5"></a> <span class="ss">f"s3://overturemaps-us-west-2/release/</span><span class="sc">{</span>OVERTUREMAPS_RELEASE<span class="sc">}</span><span class="ss">/theme=divisions/type=division_area"</span></span>
442-
<span id="cb2-6"><a href="#cb2-6"></a>).where(<span class="ss">f"subtype = 'country' and class = 'land' and country = '</span><span class="sc">{</span>COUNTRY_CODE<span class="sc">}</span><span class="ss">'"</span>).select(<span class="st">"bbox.*"</span>).toPandas().iloc[<span class="dv">0</span>]</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
440+
<span id="cb2-4"><a href="#cb2-4"></a>country_bbox <span class="op">=</span> (</span>
441+
<span id="cb2-5"><a href="#cb2-5"></a> spark.read.parquet(</span>
442+
<span id="cb2-6"><a href="#cb2-6"></a> <span class="ss">f"s3://overturemaps-us-west-2/release/</span><span class="sc">{</span>OVERTUREMAPS_RELEASE<span class="sc">}</span><span class="ss">/theme=divisions/type=division_area"</span></span>
443+
<span id="cb2-7"><a href="#cb2-7"></a> )</span>
444+
<span id="cb2-8"><a href="#cb2-8"></a> .where(<span class="ss">f"subtype = 'country' and class = 'land' and country = '</span><span class="sc">{</span>COUNTRY_CODE<span class="sc">}</span><span class="ss">'"</span>)</span>
445+
<span id="cb2-9"><a href="#cb2-9"></a> .select(<span class="st">"bbox.*"</span>)</span>
446+
<span id="cb2-10"><a href="#cb2-10"></a> .toPandas()</span>
447+
<span id="cb2-11"><a href="#cb2-11"></a> .iloc[<span class="dv">0</span>]</span>
448+
<span id="cb2-12"><a href="#cb2-12"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
443449
</div>
444-
<div id="cell-4" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;594b160d-8386-448b-b7a0-21e079c6aab7&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}" data-execution_count="0">
450+
<div id="cell-4" class="cell" data-quarto-private-1="{&quot;key&quot;:&quot;application/vnd.databricks.v1+cell&quot;,&quot;value&quot;:{&quot;cellMetadata&quot;:{&quot;byteLimit&quot;:2048000,&quot;implicitDf&quot;:true,&quot;rowLimit&quot;:10000},&quot;inputWidgets&quot;:{},&quot;nuid&quot;:&quot;594b160d-8386-448b-b7a0-21e079c6aab7&quot;,&quot;showTitle&quot;:false,&quot;tableResultSettingsMap&quot;:{},&quot;title&quot;:&quot;&quot;}}">
445451
<div class="sourceCode cell-code" id="cb3"><pre class="sourceCode numberSource python number-lines code-with-copy"><code class="sourceCode python"><span id="cb3-1"><a href="#cb3-1"></a><span class="im">from</span> pyspark.sql <span class="im">import</span> functions <span class="im">as</span> F</span>
446452
<span id="cb3-2"><a href="#cb3-2"></a></span>
447453
<span id="cb3-3"><a href="#cb3-3"></a>spark.read.parquet(</span>
448454
<span id="cb3-4"><a href="#cb3-4"></a> <span class="ss">f"s3://overturemaps-us-west-2/release/</span><span class="sc">{</span>OVERTUREMAPS_RELEASE<span class="sc">}</span><span class="ss">/theme=buildings/type=building"</span></span>
449455
<span id="cb3-5"><a href="#cb3-5"></a>).where(</span>
450-
<span id="cb3-6"><a href="#cb3-6"></a> <span class="ss">f"""bbox.xmin &lt; </span><span class="sc">{</span>country_bbox[<span class="st">'xmax'</span>]<span class="sc">}</span></span>
451-
<span id="cb3-7"><a href="#cb3-7"></a><span class="ss"> and bbox.xmax &gt; </span><span class="sc">{</span>country_bbox[<span class="st">'xmin'</span>]<span class="sc">}</span></span>
452-
<span id="cb3-8"><a href="#cb3-8"></a><span class="ss"> and bbox.ymin &lt; </span><span class="sc">{</span>country_bbox[<span class="st">'ymax'</span>]<span class="sc">}</span></span>
453-
<span id="cb3-9"><a href="#cb3-9"></a><span class="ss"> and bbox.ymax &gt; </span><span class="sc">{</span>country_bbox[<span class="st">'ymin'</span>]<span class="sc">}</span></span>
456+
<span id="cb3-6"><a href="#cb3-6"></a> <span class="ss">f"""bbox.xmin &lt; </span><span class="sc">{</span>country_bbox[<span class="st">"xmax"</span>]<span class="sc">}</span></span>
457+
<span id="cb3-7"><a href="#cb3-7"></a><span class="ss"> and bbox.xmax &gt; </span><span class="sc">{</span>country_bbox[<span class="st">"xmin"</span>]<span class="sc">}</span></span>
458+
<span id="cb3-8"><a href="#cb3-8"></a><span class="ss"> and bbox.ymin &lt; </span><span class="sc">{</span>country_bbox[<span class="st">"ymax"</span>]<span class="sc">}</span></span>
459+
<span id="cb3-9"><a href="#cb3-9"></a><span class="ss"> and bbox.ymax &gt; </span><span class="sc">{</span>country_bbox[<span class="st">"ymin"</span>]<span class="sc">}</span></span>
454460
<span id="cb3-10"><a href="#cb3-10"></a><span class="ss"> """</span></span>
455-
<span id="cb3-11"><a href="#cb3-11"></a>).withColumn(</span>
456-
<span id="cb3-12"><a href="#cb3-12"></a> <span class="st">"geometry"</span>, F.expr(<span class="st">"st_geomfromwkb(geometry)"</span>)</span>
457-
<span id="cb3-13"><a href="#cb3-13"></a>).write.mode(</span>
458-
<span id="cb3-14"><a href="#cb3-14"></a> <span class="st">"overwrite"</span></span>
459-
<span id="cb3-15"><a href="#cb3-15"></a>).saveAsTable(</span>
460-
<span id="cb3-16"><a href="#cb3-16"></a> <span class="ss">f"workspace.default.building_geom"</span></span>
461-
<span id="cb3-17"><a href="#cb3-17"></a>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
461+
<span id="cb3-11"><a href="#cb3-11"></a>).withColumn(<span class="st">"geometry"</span>, F.expr(<span class="st">"st_geomfromwkb(geometry)"</span>)).write.mode(</span>
462+
<span id="cb3-12"><a href="#cb3-12"></a> <span class="st">"overwrite"</span></span>
463+
<span id="cb3-13"><a href="#cb3-13"></a>).saveAsTable(<span class="st">"workspace.default.building_geom"</span>)</span></code><button title="Copy to Clipboard" class="code-copy-button"><i class="bi"></i></button></pre></div>
462464
</div>
463465
<p>Now we can build on Maxxen’s <a href="https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955">gist</a>, with the following adjustments:</p>
464466
<ul>
465467
<li>We keep DuckDB doing the MVT generation incl.&nbsp;the preprocessing of calculating the <a href="https://duckdb.org/docs/stable/core_extensions/spatial/functions#st_tileenvelope">ST_TileEnvelope</a> for the tiles needed for the current viewport, but of course we need Databricks SQL to actually spatial filter our Delta Table (DuckDB delta_scan currently <a href="https://github.com/duckdb/duckdb-delta/issues/248">does not read</a> GEOMETRY data types.)
466468
<ul>
467-
<li>An alternative approach could be to <a href="../stfunctions/duckdb_udf">wrap the used DuckDB functions into Spark UDF’s</a>, if we wanted to move some compute from your browser to DBSQL.</li>
469+
<li>An alternative approach could be to <a href="../stfunctions/duckdb_udf.html">wrap the used DuckDB functions into Spark UDF’s</a>, if we wanted to move some compute from your browser to DBSQL.</li>
468470
</ul></li>
469471
<li>For DBSQL we use the <a href="https://docs.databricks.com/aws/en/dev-tools/python-sql-connector">Python <code>databricks-sql-connector</code></a>, authenticating with a <a href="https://docs.databricks.com/aws/en/dev-tools/python-sql-connector#databricks-personal-access-token-authentication">Personal Access Token</a> – for serious work, you’d want to use OAuth instead.</li>
470472
<li><strong>Graceful feature limit.</strong> What to do if a tile has too many features? A common solution would be to define a minimum zoom level, but this would make it very cumbersome to move around the map, so we define a <code>MAX_FEATURES_PER_TYLE</code> instead. If this is reached, we gracefully fail and only show the tile boundaries – the user would only need to further zoom in to reveal all the features within that viewport.</li>
@@ -493,7 +495,7 @@ <h2 class="anchored" data-anchor-id="create-a-sample-table">Create a sample tabl
493495
</div>
494496
</div>
495497
<div class="callout-body-container callout-body">
496-
<p>What if you find this approach still too “slow”, from the end-user standpoint? Then you can use <a href="./duckdb_udf">PMTiles</a>. The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.</p>
498+
<p>What if you find this approach still too “slow”, from the end-user standpoint? Then you can use <a href="../viz/PMTiles.html">PMTiles</a>. The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.</p>
497499
</div>
498500
</div>
499501
<p>```python # Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson</p>

0 commit comments

Comments
 (0)