New feature: Check location of OSM objects against list of regions #2333

joto · 2025-05-29T20:38:04Z

This PR introduces a new feature: Locators. A locator is initialized with one or more regions, each region has a name and a polygon or bounding box. A geometry of an OSM object can then be checked against this region list to figure out in which region(s) it is located. This check is much faster than it would be to do this inside the database after import.

Locators can be used for all sorts of interesting features:

Read larger OSM file but import only data inside some area.
Annotate each OSM object with the country (or other region) it is in. This can then, for instance, be used to show special highway shields for each country.
Use the information which region the data is in for further processing, for instance setting of default values for the speed limit or using special language transliterations rules based on country.

Locators are created in Lua with define_locator(). Bounding boxes can be added with add_bbox(). Polygons can be added from the database by calling add_from_db() and specifiying an SQL query which can return any number of rows each defining a region with the name and the (multi)polygon as columns.

A locator can then be queried using all_intersecting() returning a list of names of all regions that intersect the specified OSM object geometry. Or the first_intersecting() function can be used which only returns a single region for those cases where there can be no overlapping data or where the details of objects straddling region boundaries don't matter.

Several example config files are provided in the flex-config/locator directory.

This will be useful later when we index geometries.

joto · 2025-05-29T20:40:30Z

@giggls Have a look at this. Might be interesting for you to simplify/speed up l10n processing.

lonvia · 2025-05-30T09:45:25Z

src/locator.cpp

+        m_data.emplace_back(region.box(), n++);
+    }
+
+    m_rtree.insert(m_data.cbegin(), m_data.cend());


I don't think m_rtree gets cleared ever. So this function may produce duplicates in the index if someone calls locator.add() some time during the import in one of the callbacks.

You are right. Fixed.

lonvia · 2025-05-30T09:49:47Z

tests/bdd/flex/locator.feature

+            NAME[]
+            """
+
+    Scenario: Define a locator without name is okay


Title not adapted.

lonvia · 2025-05-30T09:52:09Z

tests/bdd/steps/steps_execute.py

+    if '-S' in context.osm2pgsql_params:
+        context.osm2pgsql_params[context.osm2pgsql_params.index('-S') + 1] = str(outfile)
+    else:
+        context.osm2pgsql_params.extend(('-S', str(outfile)))


Please 'fix' the function below (setup_style_file()) in the same way.

Okay, fixed.

This commit introduces a new feature: Locators. A locator is initialized with one or more regions, each region has a name and a polygon or bounding box. A geometry of an OSM object can then be checked against this region list to figure out in which region(s) it is located. This check is much faster than it would be to do this inside the database after import. Locators can be used for all sorts of interesting features: * Read larger OSM file but import only data inside some area. * Annotate each OSM object with the country (or other region) it is in. This can then, for instance, be used to show special highway shields for each country. * Use the information which region the data is in for further processing, for instance setting of default values for the speed limit or using special language transliterations rules based on country. Locators are created in Lua with `define_locator()`. Bounding boxes can be added with `add_bbox()`. Polygons can be added from the database by calling `add_from_db()` and specifiying an SQL query which can return any number of rows each defining a region with the name and the (multi)polygon as columns. A locator can then be queried using `all_intersecting()` returning a list of names of all regions that intersect the specified OSM object geometry. Or the `first_intersecting()` function can be used which only returns a single region for those cases where there can be no overlapping data or where the details of objects straddling region boundaries don't matter. Several example config files are provided in the flex-config/locator directory.

mboeringa · 2025-05-30T13:49:53Z

I think it might also be relevant for this long running openstreetmap-carto issue:
gravitystorm/openstreetmap-carto#2208 (comment)

mboeringa · 2025-05-30T14:11:22Z

This check is much faster than it would be to do this inside the database after import.

Just curious: what makes this processing by osm2pgsql so much faster than doing this in the database after the import?

And how about memory usage of osm2pgsql when doing this type of processing. Any concerns to be aware of? (I guess not, as you already showed a kind of worst case scenario with country size locator regions and the massive building dataset of OpenStreetMap in the example configs).

joto · 2025-05-30T14:21:15Z

Just curious: what makes this processing by osm2pgsql so much faster than doing this in the database after the import?

I think this is mostly because everything happens in memory. Once the Locator is set up, the checks run completely in memory. The OSM data is loaded anyway as part of the osm2pgsql processing, so it is available for the check. Doing this in the database means getting all the data from disk, doing all the checks and then doing lots of database updates involving more IO including all the clever things the database has to do to keep consistent and all that. There is just so much more work involved.

Memory usage is negligable.

mboeringa · 2025-05-30T17:08:08Z

A locator can then be queried using all_intersecting() returning a list of names of all regions that intersect the specified OSM object geometry.

This raises one more question: I can imagine it being very useful to have the 'all_intersecting()' function return geometries in a specific sorted order, e.g. from largest in terms of area to smallest, to be able to use that in subsequent processing in Lua. Think of a locator build on all OpenStreetMap boundary relations of all levels, where you would like to see the largest-to-smallest set of intersecting boundary relations for a specific OSM object.

Does the current implementation implement any type of sort, or is the returned set's order completely arbitrary?

joto · 2025-05-30T17:51:20Z

Does the current implementation implement any type of sort, or is the returned set's order completely arbitrary?

It is completely arbitrary. It would be much more expensive to do this in any sorted way.

For the use case you describe you'd do it differently anyways: You'll only use the smallest divisions and use that data for the rest of the information. You don't have to check a point against all states of the USA and against the USA boundary, you check it against all states and if it is in a state it must also be in the USA. If there is any area left that is in the USA but in no state, you'll have to also check against that. It needs a bit of pre-processing but that saves you a lot of checks later.

Make box_t type available to boost as box type

60e7aba

This will be useful later when we index geometries.

lonvia reviewed May 30, 2025

View reviewed changes

joto force-pushed the locator branch from ff73430 to 24ce38f Compare May 30, 2025 12:47

lonvia merged commit 95508e2 into osm2pgsql-dev:master May 30, 2025
45 of 46 checks passed

joto deleted the locator branch May 30, 2025 20:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

New feature: Check location of OSM objects against list of regions #2333

New feature: Check location of OSM objects against list of regions #2333

Uh oh!

joto commented May 29, 2025

Uh oh!

joto commented May 29, 2025

Uh oh!

lonvia May 30, 2025

Uh oh!

joto May 30, 2025

Uh oh!

lonvia May 30, 2025

Uh oh!

joto May 30, 2025

Uh oh!

lonvia May 30, 2025

Uh oh!

joto May 30, 2025

Uh oh!

mboeringa commented May 30, 2025

Uh oh!

mboeringa commented May 30, 2025

Uh oh!

joto commented May 30, 2025

Uh oh!

Uh oh!

mboeringa commented May 30, 2025

Uh oh!

joto commented May 30, 2025

Uh oh!

Uh oh!

Uh oh!

New feature: Check location of OSM objects against list of regions #2333

New feature: Check location of OSM objects against list of regions #2333

Uh oh!

Conversation

joto commented May 29, 2025

Uh oh!

joto commented May 29, 2025

Uh oh!

lonvia May 30, 2025

Choose a reason for hiding this comment

Uh oh!

joto May 30, 2025

Choose a reason for hiding this comment

Uh oh!

lonvia May 30, 2025

Choose a reason for hiding this comment

Uh oh!

joto May 30, 2025

Choose a reason for hiding this comment

Uh oh!

lonvia May 30, 2025

Choose a reason for hiding this comment

Uh oh!

joto May 30, 2025

Choose a reason for hiding this comment

Uh oh!

mboeringa commented May 30, 2025

Uh oh!

mboeringa commented May 30, 2025

Uh oh!

joto commented May 30, 2025

Uh oh!

Uh oh!

mboeringa commented May 30, 2025

Uh oh!

joto commented May 30, 2025

Uh oh!

Uh oh!