Skip to content

New feature: Check location of OSM objects against list of regions #2333

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
May 30, 2025

Conversation

joto
Copy link
Collaborator

@joto joto commented May 29, 2025

This PR introduces a new feature: Locators. A locator is initialized with one or more regions, each region has a name and a polygon or bounding box. A geometry of an OSM object can then be checked against this region list to figure out in which region(s) it is located. This check is much faster than it would be to do this inside the database after import.

Locators can be used for all sorts of interesting features:

  • Read larger OSM file but import only data inside some area.
  • Annotate each OSM object with the country (or other region) it is in. This can then, for instance, be used to show special highway shields for each country.
  • Use the information which region the data is in for further processing, for instance setting of default values for the speed limit or using special language transliterations rules based on country.

Locators are created in Lua with define_locator(). Bounding boxes can be added with add_bbox(). Polygons can be added from the database by calling add_from_db() and specifiying an SQL query which can return any number of rows each defining a region with the name and the (multi)polygon as columns.

A locator can then be queried using all_intersecting() returning a list of names of all regions that intersect the specified OSM object geometry. Or the first_intersecting() function can be used which only returns a single region for those cases where there can be no overlapping data or where the details of objects straddling region boundaries don't matter.

Several example config files are provided in the flex-config/locator directory.

This will be useful later when we index geometries.
@joto
Copy link
Collaborator Author

joto commented May 29, 2025

@giggls Have a look at this. Might be interesting for you to simplify/speed up l10n processing.

m_data.emplace_back(region.box(), n++);
}

m_rtree.insert(m_data.cbegin(), m_data.cend());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think m_rtree gets cleared ever. So this function may produce duplicates in the index if someone calls locator.add() some time during the import in one of the callbacks.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Fixed.

NAME[]
"""

Scenario: Define a locator without name is okay
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Title not adapted.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

if '-S' in context.osm2pgsql_params:
context.osm2pgsql_params[context.osm2pgsql_params.index('-S') + 1] = str(outfile)
else:
context.osm2pgsql_params.extend(('-S', str(outfile)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please 'fix' the function below (setup_style_file()) in the same way.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, fixed.

This commit introduces a new feature: Locators. A locator is initialized
with one or more regions, each region has a name and a polygon or
bounding box. A geometry of an OSM object can then be checked against
this region list to figure out in which region(s) it is located.
This check is much faster than it would be to do this inside the
database after import.

Locators can be used for all sorts of interesting features:
* Read larger OSM file but import only data inside some area.
* Annotate each OSM object with the country (or other region) it
  is in. This can then, for instance, be used to show special
  highway shields for each country.
* Use the information which region the data is in for further
  processing, for instance setting of default values for the
  speed limit or using special language transliterations rules
  based on country.

Locators are created in Lua with `define_locator()`. Bounding boxes can
be added with `add_bbox()`. Polygons can be added from the database by
calling `add_from_db()` and specifiying an SQL query which can return
any number of rows each defining a region with the name and the
(multi)polygon as columns.

A locator can then be queried using `all_intersecting()` returning a
list of names of all regions that intersect the specified OSM object
geometry. Or the `first_intersecting()` function can be used which only
returns a single region for those cases where there can be no
overlapping data or where the details of objects straddling region
boundaries don't matter.

Several example config files are provided in the flex-config/locator
directory.
@mboeringa
Copy link

I think it might also be relevant for this long running openstreetmap-carto issue:
gravitystorm/openstreetmap-carto#2208 (comment)

@mboeringa
Copy link

This check is much faster than it would be to do this inside the database after import.

Just curious: what makes this processing by osm2pgsql so much faster than doing this in the database after the import?

And how about memory usage of osm2pgsql when doing this type of processing. Any concerns to be aware of? (I guess not, as you already showed a kind of worst case scenario with country size locator regions and the massive building dataset of OpenStreetMap in the example configs).

@joto
Copy link
Collaborator Author

joto commented May 30, 2025

Just curious: what makes this processing by osm2pgsql so much faster than doing this in the database after the import?

I think this is mostly because everything happens in memory. Once the Locator is set up, the checks run completely in memory. The OSM data is loaded anyway as part of the osm2pgsql processing, so it is available for the check. Doing this in the database means getting all the data from disk, doing all the checks and then doing lots of database updates involving more IO including all the clever things the database has to do to keep consistent and all that. There is just so much more work involved.

Memory usage is negligable.

@lonvia lonvia merged commit 95508e2 into osm2pgsql-dev:master May 30, 2025
45 of 46 checks passed
@mboeringa
Copy link

A locator can then be queried using all_intersecting() returning a list of names of all regions that intersect the specified OSM object geometry.

This raises one more question: I can imagine it being very useful to have the 'all_intersecting()' function return geometries in a specific sorted order, e.g. from largest in terms of area to smallest, to be able to use that in subsequent processing in Lua. Think of a locator build on all OpenStreetMap boundary relations of all levels, where you would like to see the largest-to-smallest set of intersecting boundary relations for a specific OSM object.

Does the current implementation implement any type of sort, or is the returned set's order completely arbitrary?

@joto
Copy link
Collaborator Author

joto commented May 30, 2025

Does the current implementation implement any type of sort, or is the returned set's order completely arbitrary?

It is completely arbitrary. It would be much more expensive to do this in any sorted way.

For the use case you describe you'd do it differently anyways: You'll only use the smallest divisions and use that data for the rest of the information. You don't have to check a point against all states of the USA and against the USA boundary, you check it against all states and if it is in a state it must also be in the USA. If there is any area left that is in the USA but in no state, you'll have to also check against that. It needs a bit of pre-processing but that saves you a lot of checks later.

@joto joto deleted the locator branch May 30, 2025 20:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants