Skip to content

Commit a0bcea2

Browse files
committed
qr
1 parent 5c1d190 commit a0bcea2

File tree

3 files changed

+292
-288
lines changed

3 files changed

+292
-288
lines changed

docs/search.json

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -311,7 +311,7 @@
311311
"href": "viz/maplibregljs.html",
312312
"title": "Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS",
313313
"section": "",
314-
"text": "Create a sample table\nWe have MVT support in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson here in an example Flask app.\n(Another nice tool to consume DuckDB MVT’s would be Martin, this is tracked in this issue.)\nBut how do we efficiently generate MVT’s from a Delta Lake table containing a GEOMETRY, given the tile indices?\nThe key thing to consider is that now Databricks has very efficient spatial join filtering via e.g. ST_Intersect, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):\nWe create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of Databricks Free Edition as I did.\nOVERTUREMAPS_RELEASE = \"2025-10-22.0\"\nCOUNTRY_CODE = \"NL\"\n\ncountry_bbox = (\n spark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area\"\n )\n .where(f\"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'\")\n .select(\"bbox.*\")\n .toPandas()\n .iloc[0]\n)\nfrom pyspark.sql import functions as F\n\nspark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building\"\n).where(\n f\"\"\"bbox.xmin < {country_bbox[\"xmax\"]}\n and bbox.xmax > {country_bbox[\"xmin\"]}\n and bbox.ymin < {country_bbox[\"ymax\"]}\n and bbox.ymax > {country_bbox[\"ymin\"]}\n \"\"\"\n).withColumn(\"geometry\", F.expr(\"st_geomfromwkb(geometry)\")).write.mode(\n \"overwrite\"\n).saveAsTable(\"workspace.default.building_geom\")\nNow we can build on Maxxen’s gist, with the following adjustments:\nIn the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work – note how 1) the graceful feature limit means that “too busy” tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.\nFind the full code below the video, which you can run locally as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).\n(click on the above image to play the video.)",
314+
"text": "Create a sample table\nWe have MVT support in DuckDB Spatial since version 1.4. This means that we can feed MVT to e.g. MaplibreGL JS, as shown by DuckDB Spatial author Max Gabrielsson here in an example Flask app.\n(Another nice tool to consume DuckDB MVT’s would be Martin, this is tracked in this issue.)\nBut how do we efficiently generate MVT’s from a Delta Lake table containing a GEOMETRY, given the tile indices?\nThe key thing to consider is that now Databricks has very efficient spatial join filtering via e.g. ST_Intersect, especially if what you are filtering for is a constant. So the following query can be sub-second for e.g. a billion polygons such as Overture Maps buildings (note that we are not using any spatial grid or bounding box filters anymore):\nWe create here a sample table of buildings in the Netherlands – the same worked for me also for all 2.5B Overture Maps buildings of the world, but if you tried to persist that table, you’d probably run against the daily usage limit of Databricks Free Edition as I did.\nOVERTUREMAPS_RELEASE = \"2025-10-22.0\"\nCOUNTRY_CODE = \"NL\"\n\ncountry_bbox = (\n spark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=divisions/type=division_area\"\n )\n .where(f\"subtype = 'country' and class = 'land' and country = '{COUNTRY_CODE}'\")\n .select(\"bbox.*\")\n .toPandas()\n .iloc[0]\n)\nfrom pyspark.sql import functions as F\n\nspark.read.parquet(\n f\"s3://overturemaps-us-west-2/release/{OVERTUREMAPS_RELEASE}/theme=buildings/type=building\"\n).where(\n f\"\"\"bbox.xmin < {country_bbox[\"xmax\"]}\n and bbox.xmax > {country_bbox[\"xmin\"]}\n and bbox.ymin < {country_bbox[\"ymax\"]}\n and bbox.ymax > {country_bbox[\"ymin\"]}\n \"\"\"\n).withColumn(\"geometry\", F.expr(\"st_geomfromwkb(geometry)\")).write.mode(\n \"overwrite\"\n).saveAsTable(\"workspace.default.building_geom\")\nNow we can build on Maxxen’s gist, with the following adjustments:\nIn the below video (showing a table with all 2.5B buildings worldwide, not just one country), you can see the tiling at work – note how 1) the graceful feature limit means that “too busy” tiles are just shown as rectangles, and 2) zoom-and-pan pauses the feature layer but after a short timeout, the tiles are drawn, with sub-second latency per tile.\nFind the full code here, which you can run locally as a Flask app (you could also embed it within a Databricks App if preferred, but the local app is of course a bit more cost-effective).\n(click on the above image to play the video.)",
315315
"crumbs": [
316316
"Visualization",
317317
"<span class='chapter-number'>10</span>  <span class='chapter-title'>Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS</span>"
@@ -322,7 +322,7 @@
322322
"href": "viz/maplibregljs.html#create-a-sample-table",
323323
"title": "Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS",
324324
"section": "",
325-
"text": "We keep DuckDB doing the MVT generation incl. the preprocessing of calculating the ST_TileEnvelope for the tiles needed for the current viewport, but of course we need Databricks SQL to actually spatial filter our Delta Table (DuckDB delta_scan currently does not read GEOMETRY data types.)\n\nAn alternative approach could be to wrap the used DuckDB functions into Spark UDF’s, if we wanted to move some compute from your browser to DBSQL.\n\nFor DBSQL we use the Python databricks-sql-connector, authenticating with a Personal Access Token – for serious work, you’d want to use OAuth instead.\nGraceful feature limit. What to do if a tile has too many features? A common solution would be to define a minimum zoom level, but this would make it very cumbersome to move around the map, so we define a MAX_FEATURES_PER_TYLE instead. If this is reached, we gracefully fail and only show the tile boundaries – the user would only need to further zoom in to reveal all the features within that viewport.\nMVT expects SRID 3857, while our table is probably in another SRID, so we need to use some st_transform there and back.\nTile throttling. we also added JS code under // === Tile throttling logic === to take a 2 second pause starting any zoom and move interaction, in order to avoid overloading the warehouse with tile requests and therefore avoid tile queueing.\n\nNote that in the current implementation this means that during zooming and moving the map, the feature layer is temporarily not visible – this probably could be improved. For example, without tile throttling, the objects would remain visible during zoom/pan, but we would need to wait much longer for the results after a big move.\n\n\n\n\n\n\n\nWatch the video\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nWhat if you find this approach still too “slow”, from the end-user standpoint? Then you can use PMTiles. The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.\n\n\n# Based on https://gist.github.com/Maxxen/37e4a9f8595ea5e6a20c0c8fbbefe955 by Max Gabrielsson\n\nimport os\n\nimport duckdb\nimport flask\n\nfrom databricks import sql # type: ignore\n\nMAX_FEATURES_PER_TILE = 30_000\n\n# Initialize Flask app\napp = flask.Flask(__name__)\n\nconfig = {\"allow_unsigned_extensions\": \"true\"}\nduckdb_con = duckdb.connect(config=config)\n\nduckdb_con.execute(\"INSTALL spatial\")\n\nduckdb_con.execute(\"load spatial\")\n\n\ndbx_con = sql.connect(\n server_hostname=os.getenv(\"DATABRICKS_SERVER_HOSTNAME\"),\n http_path=os.getenv(\"DATABRICKS_HTTP_PATH\"),\n access_token=os.getenv(\"DATABRICKS_TOKEN\"),\n)\n\n\n# Tile endpoint to serve vector tiles\n@app.route(\"/tiles/&lt;int:z&gt;/&lt;int:x&gt;/&lt;int:y&gt;.pbf\")\ndef get_tile(z, x, y):\n # Query to get the tile data from DuckDB\n # - Note that the geometry is assumed to be projected to `EPSG:3857` (Web Mercator)\n\n # Use con.cursor() to avoid threading issues with Flask\n with duckdb_con.cursor() as local_con:\n tileenv = local_con.execute(\n \"\"\"\n select st_astext(st_transform(\n st_tileenvelope($1, $2, $3),\n 'EPSG:3857',\n 'OGC:CRS84'\n ))\n \"\"\",\n [z, x, y],\n ).fetchone()\n\n query = f\"\"\"\n select\n st_aswkb(geometry) as geometry\n from\n `workspace`.`default`.`building_geom`\n where st_intersects(geometry, st_geomfromtext('{tileenv[0]}'))\n limit {MAX_FEATURES_PER_TILE}\"\"\"\n\n with dbx_con.cursor() as cursor:\n cursor.execute(query)\n da = cursor.fetchall_arrow() # noqa: F841\n\n # Use con.cursor() to avoid threading issues with Flask\n with duckdb_con.cursor() as local_con:\n tile_blob = None\n tile_count = local_con.execute(\n \"\"\"\n select count(*) cnt from da\n \"\"\"\n ).fetchone()[0]\n if tile_count == MAX_FEATURES_PER_TILE:\n # If we hit the limit, return an empty tile to avoid incomplete data\n tile_blob = local_con.execute(\n \"\"\"\n select ST_AsMVT({\n \"geometry\": ST_AsMVTGeom(\n ST_TileEnvelope($1, $2, $3),\n ST_Extent(ST_TileEnvelope($1, $2, $3))\n )\n }) \n \"\"\",\n [z, x, y],\n ).fetchone()\n else:\n tile_blob = local_con.execute(\n \"\"\"\n select ST_AsMVT({\n \"geometry\": ST_AsMVTGeom(\n st_transform(st_geomfromwkb(geometry), 'OGC:CRS84', 'EPSG:3857'),\n ST_Extent(ST_TileEnvelope($1, $2, $3))\n )\n }) from da\n \"\"\",\n [z, x, y],\n ).fetchone()\n\n # Send the tile data as a response\n tile = tile_blob[0] if tile_blob and tile_blob[0] else b\"\"\n return flask.Response(tile, mimetype=\"application/x-protobuf\")\n\n\n# HTML content for the index page\nINDEX_HTML = \"\"\"\n&lt;!DOCTYPE html&gt;\n&lt;html&gt;\n&lt;head&gt;\n &lt;meta charset=\"utf-8\"&gt;\n &lt;title&gt;Vector Tile Viewer&lt;/title&gt;\n &lt;meta name=\"viewport\" content=\"initial-scale=1,maximum-scale=1,user-scalable=no\"&gt;\n &lt;script src='https://unpkg.com/maplibre-gl@3.6.2/dist/maplibre-gl.js'&gt;&lt;/script&gt;\n &lt;link href='https://unpkg.com/maplibre-gl@3.6.2/dist/maplibre-gl.css' rel='stylesheet' /&gt;\n &lt;style&gt;\n body { margin: 0; padding: 0; }\n #map { position: absolute; top: 0; bottom: 0; width: 100%; }\n &lt;/style&gt;\n&lt;/head&gt;\n&lt;body&gt;\n&lt;div id=\"map\"&gt;&lt;/div&gt;\n&lt;script&gt;\n const map = new maplibregl.Map({\n container: 'map',\n style: {\n version: 8,\n sources: {\n 'buildings': {\n type: 'vector',\n tiles: [`${window.location.origin}/tiles/{z}/{x}/{y}.pbf`]\n },\n // Also use a public open source basemap\n 'osm': {\n type: 'raster',\n tiles: [\n 'https://a.tile.openstreetmap.org/{z}/{x}/{y}.png',\n 'https://b.tile.openstreetmap.org/{z}/{x}/{y}.png',\n 'https://c.tile.openstreetmap.org/{z}/{x}/{y}.png'\n ],\n tileSize: 256\n }\n },\n layers: [\n {\n id: 'background',\n type: 'background',\n paint: { 'background-color': '#a0c8f0' }\n },\n {\n id: 'osm',\n type: 'raster',\n source: 'osm'\n },\n {\n id: 'buildings-fill',\n type: 'fill',\n source: 'buildings',\n 'source-layer': 'layer',\n paint: {\n 'fill-color': 'blue',\n 'fill-opacity': 0.6,\n 'fill-outline-color': '#ffffff'\n }\n },\n {\n id: 'buildings-stroke',\n type: 'line',\n source: 'buildings',\n 'source-layer': 'layer',\n paint: {\n 'line-color': 'black',\n 'line-width': 0.5\n }\n }\n ]\n },\n // Zoom in on amf\n center: [5.38327, 52.15660],\n zoom: 12,\n prefetchZoomDelta: 0, // disables zoom-level prefetch\n refreshExpiredTiles: false, // don’t re-request tiles that have expired\n\n });\n\n map.addControl(new maplibregl.NavigationControl());\n\n // Add click handler to show feature properties\n map.on('click', 'buildings-fill', (e) =&gt; {\n const coordinates = e.lngLat;\n const properties = e.features[0].properties;\n\n let popupContent = '&lt;h3&gt;Building Properties&lt;/h3&gt;';\n for (const [key, value] of Object.entries(properties)) {\n popupContent += `&lt;p&gt;&lt;strong&gt;${key}:&lt;/strong&gt; ${value}&lt;/p&gt;`;\n }\n\n new maplibregl.Popup()\n .setLngLat(coordinates)\n .setHTML(popupContent)\n .addTo(map);\n });\n\n // Change cursor on hover\n map.on('mouseenter', 'buildings-fill', () =&gt; {\n map.getCanvas().style.cursor = 'pointer';\n });\n\n map.on('mouseleave', 'buildings-fill', () =&gt; {\n map.getCanvas().style.cursor = '';\n });\n\n\n// ---- Throttle building tile loading ----\nlet reloadTimeout;\n\nfunction removeBuildingLayers() {\n if (map.getLayer('buildings-fill')) map.removeLayer('buildings-fill');\n if (map.getLayer('buildings-stroke')) map.removeLayer('buildings-stroke');\n if (map.getSource('buildings')) map.removeSource('buildings');\n}\n\nfunction addBuildingLayers() {\n if (map.getSource('buildings')) return;\n\n map.addSource('buildings', {\n type: 'vector',\n tiles: [`${window.location.origin}/tiles/{z}/{x}/{y}.pbf`]\n });\n\n map.addLayer({\n id: 'buildings-fill',\n type: 'fill',\n source: 'buildings',\n 'source-layer': 'layer',\n paint: {\n 'fill-color': 'blue',\n 'fill-opacity': 0.6,\n 'fill-outline-color': '#ffffff'\n }\n });\n\n map.addLayer({\n id: 'buildings-stroke',\n type: 'line',\n source: 'buildings',\n 'source-layer': 'layer',\n paint: {\n 'line-color': 'black',\n 'line-width': 0.5\n }\n });\n}\n\n// When user starts moving or zooming\nfunction onInteractionStart() {\n clearTimeout(reloadTimeout);\n removeBuildingLayers();\n}\n\n// When user stops moving or zooming\nfunction onInteractionEnd() {\n clearTimeout(reloadTimeout);\n reloadTimeout = setTimeout(() =&gt; {\n addBuildingLayers();\n }, 2000);\n}\n\n// Bind to move & zoom events\nmap.on('movestart', onInteractionStart);\nmap.on('moveend', onInteractionEnd);\nmap.on('zoomstart', onInteractionStart);\nmap.on('zoomend', onInteractionEnd);\n\n&lt;/script&gt;\n&lt;/body&gt;\n&lt;/html&gt;\n\"\"\"\n\n\n# Serve the static HTML file for the index page\n@app.route(\"/\")\ndef index():\n return flask.Response(INDEX_HTML, mimetype=\"text/html\")\n\n\nif __name__ == \"__main__\":\n # Start on localhost\n app.run(debug=True)",
325+
"text": "We keep DuckDB doing the MVT generation incl. the preprocessing of calculating the ST_TileEnvelope for the tiles needed for the current viewport, but of course we need Databricks SQL to actually spatial filter our Delta Table (DuckDB delta_scan currently does not read GEOMETRY data types.)\n\nAn alternative approach could be to wrap the used DuckDB functions into Spark UDF’s, if we wanted to move some compute from your browser to DBSQL.\n\nFor DBSQL we use the Python databricks-sql-connector, authenticating with a Personal Access Token – for serious work, you’d want to use OAuth instead.\nGraceful feature limit. What to do if a tile has too many features? A common solution would be to define a minimum zoom level, but this would make it very cumbersome to move around the map, so we define a MAX_FEATURES_PER_TYLE instead. If this is reached, we gracefully fail and only show the tile boundaries – the user would only need to further zoom in to reveal all the features within that viewport.\nMVT expects SRID 3857, while our table is probably in another SRID, so we need to use some st_transform there and back.\nTile throttling. we also added JS code under // === Tile throttling logic === to take a 2 second pause starting any zoom and move interaction, in order to avoid overloading the warehouse with tile requests and therefore avoid tile queueing.\n\nNote that in the current implementation this means that during zooming and moving the map, the feature layer is temporarily not visible – this probably could be improved. For example, without tile throttling, the objects would remain visible during zoom/pan, but we would need to wait much longer for the results after a big move.\n\n\n\n\n\n\n\nWatch the video\n\n\n\n\n\n\n\n\n\nNote\n\n\n\nWhat if you find this approach still too “slow”, from the end-user standpoint? Then you can use PMTiles. The difference is that with the MVT approach, you directly read the Delta Lake table, and the PMTile you would need to generate which means extra compute and time.",
326326
"crumbs": [
327327
"Visualization",
328328
"<span class='chapter-number'>10</span>  <span class='chapter-title'>Scalable visualization of large Delta Lake tables with GEOMETRY columms with DuckDB Spatial MVT and MaplibreGL JS</span>"

0 commit comments

Comments
 (0)