You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/search.json
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -300,7 +300,7 @@
300
300
"href": "viz/maplibregljs.html",
301
301
"title": "Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS",
302
302
"section": "",
303
-
"text": "Create a sample table\nWe have MVT support in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson here in an example Flask app.\n(Another nice tool to consume DuckDB MVT’s would be Martin, this is tracked in this issue.)\nBut how do we efficiently generate MVT’s from a Delta Lake table containing a GEOMETRY, given the tile indices?\nThe key thing to consider is that now Databricks has very efficient spatial join filtering via e.g. ST_Intersect, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):\nWe create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of Databricks Free Edition as I did.\nOVERTUREMAPS_RELEASE = '2025-10-22.0'\nCOUNTRY_CODE = 'NL'\n\ncountry_bbox = spark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area\"\n).where(f\"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'\").select(\"bbox.*\").toPandas().iloc[0]\nfrom pyspark.sql import functions as F\n\nspark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building\"\n).where(\n f\"\"\"bbox.xmin < {country_bbox['xmax']}\n and bbox.xmax > {country_bbox['xmin']}\n and bbox.ymin < {country_bbox['ymax']}\n and bbox.ymax > {country_bbox['ymin']}\n \"\"\"\n).withColumn(\n \"geometry\", F.expr(\"st_geomfromwkb(geometry)\")\n).write.mode(\n \"overwrite\"\n).saveAsTable(\n f\"workspace.default.building_geom\"\n)\nNow we can build on Maxxen’s gist, with the following adjustments:\nIn the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work – note how 1) the graceful feature limit means that “too busy” tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.\nFind the full code below the video, which you can run locally as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).\n(click on the above image to play the video.)\n```python # Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson\nimport os\nimport duckdb import flask\nfrom databricks import sql # type: ignore\nMAX_FEATURES_PER_TILE = 30_000",
303
+
"text": "Create a sample table\nWe have MVT support in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson here in an example Flask app.\n(Another nice tool to consume DuckDB MVT’s would be Martin, this is tracked in this issue.)\nBut how do we efficiently generate MVT’s from a Delta Lake table containing a GEOMETRY, given the tile indices?\nThe key thing to consider is that now Databricks has very efficient spatial join filtering via e.g. ST_Intersect, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):\nWe create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of Databricks Free Edition as I did.\nOVERTUREMAPS_RELEASE = \"2025-10-22.0\"\nCOUNTRY_CODE = \"NL\"\n\ncountry_bbox = (\n spark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area\"\n )\n .where(f\"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'\")\n .select(\"bbox.*\")\n .toPandas()\n .iloc[0]\n)\nfrom pyspark.sql import functions as F\n\nspark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building\"\n).where(\n f\"\"\"bbox.xmin < {country_bbox[\"xmax\"]}\n and bbox.xmax > {country_bbox[\"xmin\"]}\n and bbox.ymin < {country_bbox[\"ymax\"]}\n and bbox.ymax > {country_bbox[\"ymin\"]}\n \"\"\"\n).withColumn(\"geometry\", F.expr(\"st_geomfromwkb(geometry)\")).write.mode(\n \"overwrite\"\n).saveAsTable(\"workspace.default.building_geom\")\nNow we can build on Maxxen’s gist, with the following adjustments:\nIn the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work – note how 1) the graceful feature limit means that “too busy” tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.\nFind the full code below the video, which you can run locally as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).\n(click on the above image to play the video.)\n```python # Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson\nimport os\nimport duckdb import flask\nfrom databricks import sql # type: ignore\nMAX_FEATURES_PER_TILE = 30_000",
304
304
"crumbs": [
305
305
"Visualization",
306
306
"<span class='chapter-number'>10</span> <span class='chapter-title'>Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS</span>"
<h2class="anchored" data-anchor-id="create-a-sample-table">Create a sample table</h2>
435
435
<p>We create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of <em>Databricks Free Edition</em> as I did.</p>
<spanid="cb2-6"><ahref="#cb2-6"></a>).where(<spanclass="ss">f"subtype = 'country' and class = 'land' and country = '</span><spanclass="sc">{</span>COUNTRY_CODE<spanclass="sc">}</span><spanclass="ss">'"</span>).select(<spanclass="st">"bbox.*"</span>).toPandas().iloc[<spanclass="dv">0</span>]</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
<spanid="cb2-8"><ahref="#cb2-8"></a> .where(<spanclass="ss">f"subtype = 'country' and class = 'land' and country = '</span><spanclass="sc">{</span>COUNTRY_CODE<spanclass="sc">}</span><spanclass="ss">'"</span>)</span>
<spanid="cb3-7"><ahref="#cb3-7"></a><spanclass="ss"> and bbox.xmax > </span><spanclass="sc">{</span>country_bbox[<spanclass="st">'xmin'</span>]<spanclass="sc">}</span></span>
452
-
<spanid="cb3-8"><ahref="#cb3-8"></a><spanclass="ss"> and bbox.ymin < </span><spanclass="sc">{</span>country_bbox[<spanclass="st">'ymax'</span>]<spanclass="sc">}</span></span>
453
-
<spanid="cb3-9"><ahref="#cb3-9"></a><spanclass="ss"> and bbox.ymax > </span><spanclass="sc">{</span>country_bbox[<spanclass="st">'ymin'</span>]<spanclass="sc">}</span></span>
<spanid="cb3-7"><ahref="#cb3-7"></a><spanclass="ss"> and bbox.xmax > </span><spanclass="sc">{</span>country_bbox[<spanclass="st">"xmin"</span>]<spanclass="sc">}</span></span>
458
+
<spanid="cb3-8"><ahref="#cb3-8"></a><spanclass="ss"> and bbox.ymin < </span><spanclass="sc">{</span>country_bbox[<spanclass="st">"ymax"</span>]<spanclass="sc">}</span></span>
459
+
<spanid="cb3-9"><ahref="#cb3-9"></a><spanclass="ss"> and bbox.ymax > </span><spanclass="sc">{</span>country_bbox[<spanclass="st">"ymin"</span>]<spanclass="sc">}</span></span>
<spanid="cb3-13"><ahref="#cb3-13"></a>).saveAsTable(<spanclass="st">"workspace.default.building_geom"</span>)</span></code><buttontitle="Copy to Clipboard" class="code-copy-button"><iclass="bi"></i></button></pre></div>
462
464
</div>
463
465
<p>Now we can build on Maxxen’s <ahref="https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955">gist</a>, with the following adjustments:</p>
464
466
<ul>
465
467
<li>We keep DuckDB doing the MVT generation incl. the preprocessing of calculating the <ahref="https://duckdb.org/docs/stable/core_extensions/spatial/functions#st_tileenvelope">ST_TileEnvelope</a> for the tiles needed for the current viewport, but of course we need Databricks SQL to actually spatial filter our Delta Table (DuckDB delta_scan currently <ahref="https://github.com/duckdb/duckdb-delta/issues/248">does not read</a> GEOMETRY data types.)
466
468
<ul>
467
-
<li>An alternative approach could be to <ahref="../stfunctions/duckdb_udf">wrap the used DuckDB functions into Spark UDF’s</a>, if we wanted to move some compute from your browser to DBSQL.</li>
469
+
<li>An alternative approach could be to <ahref="../stfunctions/duckdb_udf.html">wrap the used DuckDB functions into Spark UDF’s</a>, if we wanted to move some compute from your browser to DBSQL.</li>
468
470
</ul></li>
469
471
<li>For DBSQL we use the <ahref="https://docs.databricks.com/aws/en/dev-tools/python-sql-connector">Python <code>databricks-sql-connector</code></a>, authenticating with a <ahref="https://docs.databricks.com/aws/en/dev-tools/python-sql-connector#databricks-personal-access-token-authentication">Personal Access Token</a> – for serious work, you’d want to use OAuth instead.</li>
470
472
<li><strong>Graceful feature limit.</strong> What to do if a tile has too many features? A common solution would be to define a minimum zoom level, but this would make it very cumbersome to move around the map, so we define a <code>MAX_FEATURES_PER_TYLE</code> instead. If this is reached, we gracefully fail and only show the tile boundaries – the user would only need to further zoom in to reveal all the features within that viewport.</li>
@@ -493,7 +495,7 @@ <h2 class="anchored" data-anchor-id="create-a-sample-table">Create a sample tabl
493
495
</div>
494
496
</div>
495
497
<divclass="callout-body-container callout-body">
496
-
<p>What if you find this approach still too “slow”, from the end-user standpoint? Then you can use <ahref="./duckdb_udf">PMTiles</a>. The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.</p>
498
+
<p>What if you find this approach still too “slow”, from the end-user standpoint? Then you can use <ahref="../viz/PMTiles.html">PMTiles</a>. The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.</p>
497
499
</div>
498
500
</div>
499
501
<p>```python # Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson</p>
0 commit comments