Add `--source_parallelism` flag to run multiple input sources concurrently by yuiseki · Pull Request #1568 · onthegomap/planetiler

yuiseki · 2026-05-29T08:26:21Z

Summary

When a custom schema declares many small input sources whose per-feature work is light, Planetiler.run processes them sequentially and cannot saturate process_threads. On one such build, the pool runs at avg 2.6 of 31 threads per source (~8% CPU utilization).

This PR adds an opt-in --source_parallelism=N flag (default 1). With N>1, up to N source stages run concurrently against a shared executor.

Impact

Purely additive. Default preserves the existing sequential behavior bit-for-bit. Output .pmtiles is MD5-identical across N=1, N=4, and N=8 on a real multi-source build I use locally.

Performance

NVMe, 32-core host, JDK 21:

`--source_parallelism`	wall-clock	speedup
1 (default)	10m 43s	1.00x
4	4m 01s	2.67x
8	3m 52s	2.77x

HDD, same workload: N=4 is 2.04x (14m 04s vs 6m 54s); N=8 regresses to 9m 15s under disk contention.

Notes

Per-stage thread CPU breakdown in stats.json gets mixed when stages overlap, since Timers.currentStage assumes LIFO nesting. Wall-clock per stage, output archive, and progress logs are unaffected. Happy to follow up with a fix in a separate PR if maintainers want it bundled.

Defaulting N>1 and auto-tuning are out of scope here; open to either as follow-ups.

AI assistance

Per CONTRIBUTING.md#ai-assisted-contributions: drafted with Claude Code. I reviewed every line, ran spotless:check and planetiler-core tests on JDK 21, and confirmed byte-identical output on a real build.

…ently

github-actions · 2026-05-29T08:32:26Z

This Branch aa8c283 Base f91cc19

0:01:11 DEB [archive] - Tile stats:
0:01:11 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (162k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:88k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:86k)
3. 10/308/381 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
4. 10/308/380 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
5. 14/4941/6092 (121k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:69k)
6. 14/4941/6093 (118k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (poi:62k)
7. 14/4946/6113 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.48389/-71.31226 (building:59k)
8. 14/4946/6112 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.50035/-71.31226 (building:67k)
9. 14/4940/6092 (102k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
10. 14/4942/6091 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
0:01:11 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   802   287   396   490   670  1.6k    2k  6.9k  6.2k  5.6k  4.4k  6.9k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   487   487   487   773   862  1.1k  1.8k  3.3k  6.2k  3.9k    2k   966    1k  6.2k
            landuse    0     0     0     0   549   695  1.6k  6.9k   18k   44k   58k   49k   38k   19k   12k   58k
     transportation    0     0     0     0   355    1k  1.5k  4.6k  6.4k   21k   15k   17k   67k   38k   38k   67k
           waterway    0     0     0     0   112   119     0     0     0  3.3k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.1k    4k  9.7k   19k   13k  8.2k  3.7k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   293   360  1.1k  1.9k  5.8k  4.8k    4k  3.5k   18k   18k
          landcover    0     0     0     0     0     0     0  9.6k   29k   86k   72k   82k   53k   30k   26k   86k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.4k  2.8k  1.4k  1.4k   869  4.4k
         water_name    0     0     0     0     0     0     0     0     0   528   503   475   494  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   289   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k    2k    3k  3.3k  2.8k  3.3k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   589   586   88k   88k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k  6.4k   21k   41k   85k  203k  185k  135k  114k  120k  255k  255k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k    5k   14k   29k   61k  149k  138k   99k   84k   85k  162k  162k
0:01:11 DEB [archive] -    Max tile: 255k (gzipped: 162k)
0:01:11 DEB [archive] -    Avg tile: 5.5k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:11 DEB [archive] -     # tiles: 4,115,030
0:01:11 DEB [archive] -  # features: 5,779,817
0:01:11 INF [archive] - Finished in 20s cpu:1m13s avg:3.7
0:01:11 INF [archive] -   read    1x(3% 0.6s wait:18s done:1s)
0:01:11 INF [archive] -   encode  4x(54% 11s wait:2s done:1s)
0:01:11 INF [archive] -   write   1x(18% 4s wait:14s done:1s)
0:01:11 INF [archive] - Finished in 1m12s cpu:3m37s gc:1s avg:3
0:01:11 INF [archive] - FINISHED!
0:01:11 INF [archive] - 
0:01:11 INF [archive] - ----------------------------------------
0:01:11 INF [archive] - data errors:
0:01:11 INF [archive] - 	render_snap_fix_input	16,800
0:01:11 INF [archive] - 	osm_multipolygon_missing_way	377
0:01:11 INF [archive] - 	osm_boundary_missing_way	55
0:01:11 INF [archive] - 	merge_snap_fix_input	9
0:01:11 INF [archive] - 	osm_multipolygon_duplicate_member	4
0:01:11 INF [archive] - 	omt_fix_water_before_ne_intersect	2
0:01:11 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:11 INF [archive] - 	render_snap_fix_input2	1
0:01:11 INF [archive] - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:11 INF [archive] - ----------------------------------------
0:01:11 INF [archive] - 	overall          1m12s cpu:3m37s gc:1s avg:3
0:01:11 INF [archive] - 	lake_centerlines 3s cpu:6s avg:2
0:01:11 INF [archive] - 	  read     1x(17% 0.5s done:3s)
0:01:11 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:11 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:11 INF [archive] - 	water_polygons   16s cpu:40s avg:2.5
0:01:11 INF [archive] - 	  read     1x(43% 7s done:8s)
0:01:11 INF [archive] - 	  process  4x(21% 3s wait:5s done:6s)
0:01:11 INF [archive] - 	  write    1x(3% 0.5s wait:10s done:6s)
0:01:11 INF [archive] - 	natural_earth    11s cpu:19s avg:1.6
0:01:11 INF [archive] - 	  read     1x(55% 6s done:5s)
0:01:11 INF [archive] - 	  process  4x(7% 0.8s wait:6s done:5s)
0:01:11 INF [archive] - 	  write    1x(0% 0s wait:6s done:5s)
0:01:11 INF [archive] - 	osm_pass1        2s cpu:7s avg:3.3
0:01:11 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:11 INF [archive] - 	  parse    4x(37% 0.8s)
0:01:11 INF [archive] - 	  process  1x(65% 1s)
0:01:11 INF [archive] - 	osm_pass2        17s cpu:1m6s avg:3.9
0:01:11 INF [archive] - 	  read     1x(0% 0s wait:10s done:7s)
0:01:11 INF [archive] - 	  process  4x(69% 11s)
0:01:11 INF [archive] - 	  write    1x(3% 0.6s wait:16s)
0:01:11 INF [archive] - 	ne_lakes         0s cpu:0s avg:0
0:01:11 INF [archive] - 	boundaries       0s cpu:0s avg:0
0:01:11 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:11 INF [archive] - 	sort             1s cpu:4s avg:2.4
0:01:11 INF [archive] - 	  worker  1x(53% 0.8s)
0:01:11 INF [archive] - 	archive          20s cpu:1m13s avg:3.7
0:01:11 INF [archive] - 	  read    1x(3% 0.6s wait:18s done:1s)
0:01:11 INF [archive] - 	  encode  4x(54% 11s wait:2s done:1s)
0:01:11 INF [archive] - 	  write   1x(18% 4s wait:14s done:1s)
0:01:11 INF [archive] - ----------------------------------------
0:01:11 INF [archive] - 	archive	109MB
0:01:11 INF [archive] - 	features	298MB
-rw-r--r-- 1 runner runner 87M May 29 08:41 run.jar

0:01:04 DEB [archive] - Tile stats:
0:01:04 DEB [archive] - Biggest tiles (gzipped)
1. 14/4942/6092 (162k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.40015 (poi:88k)
2. 9/154/190 (149k) https://onthegomap.github.io/planetiler-demo/#9.5/41.77078/-71.36719 (landcover:86k)
3. 10/308/381 (138k) https://onthegomap.github.io/planetiler-demo/#10.5/41.63994/-71.54297 (landcover:72k)
4. 10/308/380 (137k) https://onthegomap.github.io/planetiler-demo/#10.5/41.90214/-71.54297 (landcover:66k)
5. 14/4941/6092 (121k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.42212 (poi:69k)
6. 14/4941/6093 (118k) https://onthegomap.github.io/planetiler-demo/#14.5/41.81227/-71.42212 (poi:62k)
7. 14/4946/6113 (112k) https://onthegomap.github.io/planetiler-demo/#14.5/41.48389/-71.31226 (building:59k)
8. 14/4946/6112 (111k) https://onthegomap.github.io/planetiler-demo/#14.5/41.50035/-71.31226 (building:67k)
9. 14/4940/6092 (102k) https://onthegomap.github.io/planetiler-demo/#14.5/41.82864/-71.44409 (building:92k)
10. 14/4942/6091 (101k) https://onthegomap.github.io/planetiler-demo/#14.5/41.84501/-71.40015 (building:79k)
0:01:04 DEB [archive] - Max tile sizes
                      z0    z1    z2    z3    z4    z5    z6    z7    z8    z9   z10   z11   z12   z13   z14   all
           boundary  151   336   409   544   802   287   396   490   670  1.6k    2k  6.9k  6.2k  5.6k  4.4k  6.9k
              water 7.7k  3.7k  8.6k  5.5k  2.6k  5.1k   15k   18k   16k   26k   15k   13k   17k   15k   12k   26k
              place    0     0   487   487   487   773   862  1.1k  1.8k  3.3k  6.2k  3.9k    2k   966    1k  6.2k
            landuse    0     0     0     0   549   695  1.6k  6.9k   18k   44k   58k   49k   38k   19k   12k   58k
     transportation    0     0     0     0   355    1k  1.5k  4.6k  6.4k   21k   15k   17k   67k   38k   38k   67k
           waterway    0     0     0     0   112   119     0     0     0  3.3k  2.4k  2.1k  2.1k  4.9k  2.4k  4.9k
               park    0     0     0     0     0     0  1.1k    4k  9.7k   19k   13k  8.2k  3.7k  3.4k  4.4k   19k
transportation_name    0     0     0     0     0     0   293   360  1.1k  1.9k  5.8k  4.8k    4k  3.5k   18k   18k
          landcover    0     0     0     0     0     0     0  9.6k   29k   86k   72k   82k   53k   30k   26k   86k
      mountain_peak    0     0     0     0     0     0     0  1.1k  1.8k  3.4k  4.4k  2.8k  1.4k  1.4k   869  4.4k
         water_name    0     0     0     0     0     0     0     0     0   528   503   475   494  1.2k  1.5k  1.5k
    aerodrome_label    0     0     0     0     0     0     0     0     0     0   666   289   273   221   221   666
            aeroway    0     0     0     0     0     0     0     0     0     0  1.6k    2k    3k  3.3k  2.8k  3.3k
                poi    0     0     0     0     0     0     0     0     0     0     0     0   589   586   88k   88k
           building    0     0     0     0     0     0     0     0     0     0     0     0     0   59k   92k   92k
        housenumber    0     0     0     0     0     0     0     0     0     0     0     0     0     0   35k   35k
          full tile 7.9k    4k  9.5k  6.5k  3.7k  6.4k   21k   41k   85k  203k  185k  135k  114k  120k  255k  255k
            gzipped 6.2k  3.5k  7.1k  5.2k  3.1k    5k   14k   29k   61k  149k  138k   99k   84k   85k  162k  162k
0:01:04 DEB [archive] -    Max tile: 255k (gzipped: 162k)
0:01:04 DEB [archive] -    Avg tile: 5.5k (gzipped: 4.1k) using weighted average based on OSM traffic
0:01:04 DEB [archive] -     # tiles: 4,115,030
0:01:04 DEB [archive] -  # features: 5,779,817
0:01:04 INF [archive] - Finished in 19s cpu:1m11s avg:3.7
0:01:04 INF [archive] -   read    1x(3% 0.6s wait:18s done:1s)
0:01:04 INF [archive] -   encode  4x(55% 10s wait:2s done:1s)
0:01:04 INF [archive] -   write   1x(19% 4s wait:14s)
0:01:04 INF [archive] - Finished in 1m5s cpu:3m25s gc:1s avg:3.2
0:01:04 INF [archive] - FINISHED!
0:01:04 INF [archive] - 
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - data errors:
0:01:04 INF [archive] - 	render_snap_fix_input	16,800
0:01:04 INF [archive] - 	osm_multipolygon_missing_way	377
0:01:04 INF [archive] - 	osm_boundary_missing_way	55
0:01:04 INF [archive] - 	merge_snap_fix_input	9
0:01:04 INF [archive] - 	osm_multipolygon_duplicate_member	4
0:01:04 INF [archive] - 	omt_fix_water_before_ne_intersect	2
0:01:04 INF [archive] - 	feature_polygon_osm_invalid_multipolygon_empty_after_fix	2
0:01:04 INF [archive] - 	render_snap_fix_input2	1
0:01:04 INF [archive] - 	omt_park_area_osm_invalid_multipolygon_empty_after_fix	1
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	overall          1m5s cpu:3m25s gc:1s avg:3.2
0:01:04 INF [archive] - 	lake_centerlines 2s cpu:5s avg:2.3
0:01:04 INF [archive] - 	  read     1x(23% 0.5s done:2s)
0:01:04 INF [archive] - 	  process  4x(0% 0s done:2s)
0:01:04 INF [archive] - 	  write    1x(0% 0s done:2s)
0:01:04 INF [archive] - 	water_polygons   16s cpu:40s avg:2.5
0:01:04 INF [archive] - 	  read     1x(43% 7s done:8s)
0:01:04 INF [archive] - 	  process  4x(21% 3s wait:5s done:6s)
0:01:04 INF [archive] - 	  write    1x(3% 0.4s wait:10s done:6s)
0:01:04 INF [archive] - 	natural_earth    6s cpu:13s avg:2
0:01:04 INF [archive] - 	  read     1x(95% 6s)
0:01:04 INF [archive] - 	  process  4x(13% 0.8s wait:6s)
0:01:04 INF [archive] - 	  write    1x(0% 0s wait:6s)
0:01:04 INF [archive] - 	osm_pass1        2s cpu:8s avg:3.3
0:01:04 INF [archive] - 	  read     1x(2% 0s wait:2s)
0:01:04 INF [archive] - 	  parse    4x(35% 0.8s wait:1s)
0:01:04 INF [archive] - 	  process  1x(68% 2s)
0:01:04 INF [archive] - 	osm_pass2        16s cpu:1m3s avg:4
0:01:04 INF [archive] - 	  read     1x(0% 0s wait:10s done:6s)
0:01:04 INF [archive] - 	  process  4x(68% 11s)
0:01:04 INF [archive] - 	  write    1x(3% 0.5s wait:15s)
0:01:04 INF [archive] - 	ne_lakes         0s cpu:0s avg:12.4
0:01:04 INF [archive] - 	boundaries       0s cpu:0s avg:0
0:01:04 INF [archive] - 	agg_stop         0s cpu:0s avg:0
0:01:04 INF [archive] - 	sort             1s cpu:4s avg:2.5
0:01:04 INF [archive] - 	  worker  1x(52% 0.8s)
0:01:04 INF [archive] - 	archive          19s cpu:1m11s avg:3.7
0:01:04 INF [archive] - 	  read    1x(3% 0.6s wait:18s done:1s)
0:01:04 INF [archive] - 	  encode  4x(55% 10s wait:2s done:1s)
0:01:04 INF [archive] - 	  write   1x(19% 4s wait:14s)
0:01:04 INF [archive] - ----------------------------------------
0:01:04 INF [archive] - 	archive	109MB
0:01:04 INF [archive] - 	features	298MB
-rw-r--r-- 1 runner runner 87M May 29 08:42 run.jar

Full logs: https://github.com/onthegomap/planetiler/actions/runs/26627315329

- BLOCKER (java:S2095): use try-with-resources for ExecutorService - CODE_SMELL (java:S6885): replace Math.max(1, Math.min(...)) with explicit if-dispatch + plain Math.min, so the sequential vs parallel branch is easier to read No behavior change.

sonarqubecloud · 2026-05-29T08:45:15Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
67.9% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

msbarry · 2026-06-03T10:44:50Z

Thanks for taking the time to make this change! Would it be possible to share full logs from the planetiler run before and after this change with your input data? In general planetiler tries to keep parallelism minimal so that num threads spreads works across all available cores instead of just loading it up with threads, so I want to see exactly what is limiting the parallelism before deviating from that goal.

yuiseki · 2026-06-03T20:09:17Z

Thanks for the thoughtful question, and the point about keeping parallelism minimal makes sense to me. I put together a fully reproducible benchmark on open data so you can see exactly what limits the parallelism without needing my private input.

What limits the parallelism

Each source stage is fed by a single shapefile reader thread (shapefiles are not splittable, so reading a source is inherently single-threaded). When a schema has many small sources whose per-feature work is light, that single reader cannot produce features fast enough to keep process_threads busy, so most of the pool sits idle for that stage. Because sources run one after another, the entire source-reading phase stays at low utilization no matter how high process_threads is.

In the run below, the 140 source stages average 3.5 active threads out of 31, and the source-reading phase as a whole runs at about 4.7 threads (15% of 31 cores). The total CPU time is essentially identical across N=1/4/8 (~12m), so this is not extra work, it is the same work spread over idle cores. --source_parallelism overlaps several single-threaded readers so the pool fills up.

Reproducible open-data benchmark

7 Geofabrik free shapefile extracts, 20 layers each = 140 shapefile sources, 57,206,689 features, z11-14. 32-core host, JDK, NVMe, warm page cache.

Regions: ireland-and-northern-ireland, connecticut, iceland, new-hampshire, rhode-island, luxembourg, vermont (all from https://download.geofabrik.de). Geometries normalized with ogr2ogr so geotools accepts a handful of degenerate OSM polygons. Schema, download script, and the full before/after logs are in this gist: https://gist.github.com/yuiseki/8679a16d3dc946a13d13408687cec900

java -jar planetiler.jar generate-custom --schema=schema.yml --output=out.pmtiles [--source_parallelism=N]

`--source_parallelism`	total wall	source-read phase	overall avg threads (of 31)	speedup
1 (default)	1m45s	94s	6.9	1.00x
4	34s	23s	21.4	3.06x
8	42s	31s	18.1	2.49x

sort (~4s) and archive (~6s) are unchanged between runs since they are already parallel, so the whole difference is in the source-reading phase. N=4 is the sweet spot here and N=8 regresses, which is why the flag is opt-in with a default of 1 rather than auto-tuned.

Output is byte-identical across all three (md5 27c5acd626bcd30e90676c0838fe298e, same 57,206,689 features), so nothing about the result changes, only how long it takes to produce.

On the cosmetic caveat

As noted in the PR description, per-stage CPU numbers in the logs get mixed once stages overlap, because Timers.currentStage assumes LIFO nesting. The per-stage cpu:/avg: fields under N>1 are therefore inflated and should be ignored; total wall, the overall summary line, output archive, and progress logs are all correct. Happy to fix that in a separate PR if you want it bundled.

Full before/after logs, schema, and the download script are all in the gist: https://gist.github.com/yuiseki/8679a16d3dc946a13d13408687cec900

Add --source_parallelism flag to run multiple input sources concurr…

5d58380

…ently

yuiseki mentioned this pull request May 29, 2026

2026-06-02T12:30/12:55+09:00 🖐Smart Maps Meetup Weekly UNopenGIS/7#908

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Add `--source_parallelism` flag to run multiple input sources concurrently#1568

Add `--source_parallelism` flag to run multiple input sources concurrently#1568
yuiseki wants to merge 2 commits into
onthegomap:mainfrom
yuiseki:add-source-parallelism-flag

yuiseki commented May 29, 2026

Uh oh!

github-actions Bot commented May 29, 2026 •

edited

Loading

Uh oh!

sonarqubecloud Bot commented May 29, 2026

Uh oh!

msbarry commented Jun 3, 2026 •

edited

Loading

Uh oh!

yuiseki commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Uh oh!

Conversation

yuiseki commented May 29, 2026

Summary

Impact

Performance

Notes

AI assistance

Uh oh!

github-actions Bot commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

sonarqubecloud Bot commented May 29, 2026

Quality Gate passed

Uh oh!

msbarry commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

yuiseki commented Jun 3, 2026

What limits the parallelism

Reproducible open-data benchmark

On the cosmetic caveat

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

github-actions Bot commented May 29, 2026 •

edited

Loading

msbarry commented Jun 3, 2026 •

edited

Loading