Skip to content

Add pbf module to load graphs from OSM PBF files#1338

Open
gboeing wants to merge 4 commits into
mainfrom
pbf
Open

Add pbf module to load graphs from OSM PBF files#1338
gboeing wants to merge 4 commits into
mainfrom
pbf

Conversation

@gboeing

@gboeing gboeing commented Aug 14, 2025

Copy link
Copy Markdown
Owner

This PR proposes a new module to read data from PBF files. It uses the osmium library to read PBF data and provides a familiar graph_from_pbf function for users. This is much faster than using the Overpass API and allows users to download data extracts locally for loading.

Initially I felt like we shouldn't provide any data filtering, and just load whatever data is in the user's desired PBF file. But we can maybe provide some simple built in filtering, with the expectation that the user who needs richer filtering should just pre-filter their PBF file as they wish.

Here's a simple usage example, using Geofabrik's DC extract:

import osmnx as ox
filepath = "data/dc.osm.pbf"

# load PBF ways into graph with no filtering
G = ox.pbf.graph_from_pbf(filepath)
print(len(G))

# with simple filtering: only ways with highway=primary
tags = {"highway": ["primary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

Or you can load the state of Oregon's highway network in ~8 seconds:

filepath = "./data/oregon-latest.osm.pbf"
tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link"]}
G = ox.pbf.graph_from_pbf(filepath, tags)

Or you can load all of Australia's highway network in ~45 seconds:

filepath = "./data/australia-latest.osm.pbf"
tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link"]}
G = ox.pbf.graph_from_pbf(filepath, tags)

More simple filtering examples:

filepath = "data/dc.osm.pbf"
tags = ["highway"]
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"railway": ["subway"], "highway": ["primary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"railway": ["subway"], "highway": ["primary", "secondary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"railway": ["subway"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = ["railway", "highway"]
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link", "primary", "secondary"]}
G = ox.pbf.graph_from_pbf(filepath, tags)
print(len(G))

Gp = ox.projection.project_graph(G)
gdf_nodes, gdf_edges = ox.convert.graph_to_gdfs(Gp)
fig, ax = ox.plot.plot_graph(Gp, node_size=2)

I've tested projection, plotting, converting, etc and all seems to work fine with the graphs we've created from the PBF files.

You can also create a graph from Overpass with OSMnx, save to an OSM XML file, convert that to a PBF file, then load it back into OSMnx (as proof of concept only... this isn't a good workflow):

from pathlib import Path
import osmium
fp_xml = Path("./data/graph.osm")
fp_pbf = Path("./data/graph.pbf")
ox.settings.all_oneway = True
G = ox.graph.graph_from_address("Piedmont, California, USA", dist=300, network_type="drive", simplify=False)
ox.io.save_graph_xml(G, fp_xml)
if fp_pbf.is_file():
    fp_pbf.unlink()
with osmium.SimpleWriter(fp_pbf) as writer:
    for obj in osmium.FileProcessor(fp_xml):
        writer.add(obj)
filepath = "./data/graph.pbf"
G = ox.pbf.graph_from_pbf(filepath)
len(G)

Any comments, feedback, or testing would be much appreciated.

@codecov

codecov Bot commented Aug 15, 2025

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 17.33333% with 62 lines in your changes missing coverage. Please review.
✅ Project coverage is 96.28%. Comparing base (4afd7dd) to head (26f8fd7).
⚠️ Report is 48 commits behind head on main.

Files with missing lines Patch % Lines
osmnx/pbf.py 16.21% 62 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1338      +/-   ##
==========================================
- Coverage   98.57%   96.28%   -2.29%     
==========================================
  Files          25       26       +1     
  Lines        2591     2666      +75     
==========================================
+ Hits         2554     2567      +13     
- Misses         37       99      +62     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@shiqin-liu

Copy link
Copy Markdown

This is an impressive progress, thanks for working on this feature!

I ran the feature locally, loading a sample Chicago metro osm .pbf, which takes about ~2.8 min. I also test for projection and routing, also works as expected!

import osmnx as ox
import time
ox.__version__
'2.1.0.dev0'

start_time = time.time()
filepath='inputdata/osm_Chicago.pbf'
G = ox.pbf.graph_from_pbf(filepath, simplify=True)
print(f"Total processing time: {time.time() - start_time:.2f} seconds")

Total processing time: 170.85 seconds

Quick thoughts/questions on data filtering, the global-indicators software does rely on the OSMnx default walk type filter to retrieve the pedestrian network, e.g. https://github.com/healthysustainablecities/global-indicators/blob/1ad567bfe93c56a0c93f6512a1c215f11f773dc7/process/subprocesses/_03_create_network_resources.py#L112 . Many users might appreciate a default filter for convenience. But with current setting, I am thinking of a scenario where we might want to get the OSMnx walk network from .pbf file, what would be the best way to handle it? does current tag parameter acts the same as a custom filter where we could pass a customized walk filter?

@carlhiggs

carlhiggs commented Aug 19, 2025

Copy link
Copy Markdown

Thanks again for your work on this @gboeing --- I agree with @shiqin-liu that capacity to use a custom filter definition (or preset typology) would be really useful. In the code @shiqin-liu linked to, we actually use a custom filter (a couple of lines above the network type), that modifies the definition to remove the cycling restriction (defined here):

'["highway"]["area"!~"yes"]["highway"!~"motor|proposed|construction|abandoned|platform|raceway"]["foot"!~"no"]["service"!~"private"]["access"!~"private"]'

So, basically, having options like currently exist for graph_from_polygon, but optionally using a pbf as the source would be amazing.

I installed this branch using UV, but it didn't seem to install the osmium module successfully by itself, I had to separately uv pip install this on my side. Not sure if that's relevant (it might be an issue on my side) but just mentioning in case there is something to be done flagging osmium as a dependency.

D:\projects\repos\osmnx-pbf

>.venv\Scripts\activate.bat

(osmnx-pbf) [---] Tue 19/08/2025 15:19:59.72
D:\projects\repos\osmnx-pbf

>python
Python 3.12.3 (tags/v3.12.3:f6650f9, Apr  9 2024, 14:05:25) [MSC v.1938 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import osmnx as ox
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\__init__.py", line 26, in <module>
    from . import pbf as pbf
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\pbf.py", line 13, in <module>
    import osmium
ModuleNotFoundError: No module named 'osmium'

I also had an issue when I tried using this with our example Las Palmas pbf from the global-indicators project

>>> import osmnx as ox
>>> filepath = 'data/example_las_palmas_2023_osm_20230221.pbf'
>>> G = ox.pbf.graph_from_pbf(filepath)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\pbf.py", line 189, in graph_from_pbf
    G = graph._create_graph(response_json, bidirectional)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\graph.py", line 663, in _create_graph
    G = distance.add_edge_lengths(G)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\distance.py", line 229, in add_edge_lengths
    raise ValueError(msg)
ValueError: Some edges missing nodes, possibly due to input data clipping issue.
>>>

I tried with on online Las Palmas excerpt from OpenStreetMap.fr. The example in our study (3.1MB) is a bespoke clipped version, but I would have expected a more or less official excerpt to work, but it had a similar failure:

>>> filepath = 'data/las_palmas.osm.pbf'
>>> G = ox.pbf.graph_from_pbf(filepath)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\pbf.py", line 189, in graph_from_pbf
    G = graph._create_graph(response_json, bidirectional)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\graph.py", line 663, in _create_graph
    G = distance.add_edge_lengths(G)
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\projects\repos\osmnx-pbf\.venv\Lib\site-packages\osmnx\distance.py", line 229, in add_edge_lengths
    raise ValueError(msg)
ValueError: Some edges missing nodes, possibly due to input data clipping issue.
>>> G
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
NameError: name 'G' is not defined

So using clipped pbfs might cause some issues, even when sourced from online re-publishers of OSM excerpts. Its good that it reports the issues, although, rather than failing, it could be convenient if there were a way this could work and just warn the user about potential for 'edge cases' where missing nodes on edges near the study region boundary could not be resolved, possibly due to data clipping issues. (and still return the graph object for use, with that caveat).

I'll aim to look into this more tomorrow (eg if its possible to operationalise a custom pedestrian definition using the current functionality), but wanted to share this early feedback.

Thanks again for exploring this possibility --- it will be a very useful functionality!

Base automatically changed from v2.1 to main December 12, 2025 21:52
@mvexel

mvexel commented May 4, 2026

Copy link
Copy Markdown

I stumbled upon this while I was looking into availability of the public Overpass server instances and reading #1377.

Is loading from PBF still a path that you are pursuing? If yes, I may be able to contribute to the filtering logic. I have worked with Overpass in Python a fair amount and am also the maintainer of https://codeberg.org/mvexel/overpass-api-python-wrapper.

As a first step, supporting the existing osmnx presets like drive, walk, bike could be useful?

@gboeing

gboeing commented May 4, 2026

Copy link
Copy Markdown
Owner Author

@mvexel yes, this PR fell by the wayside a few months back, but it includes a simple working version of PBF loading functionality already.

The filtering logic presents an interesting problem... osmium makes it a bit tricky to do normal OSMnx-style network_type filters, but I'm open to clever ideas here.

However, I think the normal custom_filter function argument logic will be more or less impossible, because it would require parsing arbitrary Overpass filter code.

@mvexel

mvexel commented May 4, 2026

Copy link
Copy Markdown

I was thinking in the direction of abstracting the filter definitions from the PBF / Overpass readers. There would be a _filters.py that would have the canonical declarations of the filter presets (highway k/v to exclude, etc.), and _overpass.py and pbf.py would transform those declarations into their respective syntax.

If this sounds like something you would consider, I can work on a minimal example to start.

@gboeing

gboeing commented May 5, 2026

Copy link
Copy Markdown
Owner Author

Sounds like a potentially good approach. Sure I'd be happy to review a PR.

@mvexel mvexel mentioned this pull request May 17, 2026
3 tasks
@mvexel

mvexel commented May 17, 2026

Copy link
Copy Markdown

I had some free time to finish this up - sorry for the delay! #1381. I tried to keep it as small and contained as possible but the filter logic was a little bit more involved that I initially thought. Looking forward to discussing at #1381

@b-a0

b-a0 commented Jun 24, 2026

Copy link
Copy Markdown

This is great! I've installed it in my conda environment by adding the following to the environment.yml:

...
  - pip
  - pip:
    - osmnx @ git+https://github.com/gboeing/osmnx.git@pbf
    - osmium

I could then load the .pbf file for the Netherlands from Geofabrik (~1 GB) and extract and plot highways:

Loading took 88 seconds:

tags = {"highway": ["motorway", "motorway_link", "trunk", "trunk_link", "primary", "secondary"]}
G = graph_from_pbf(filepath, tags)

Reprojecting and plotting took 8 seconds:

Gp = ox.projection.project_graph(G)
gdf_nodes, gdf_edges = ox.convert.graph_to_gdfs(Gp)
fig, ax = ox.plot.plot_graph(Gp, node_size=2)
image

I hope this can be merged soon, such that it's easier to install.

If there is any specific functionality I can test, let me know, I'll do my best to help!

@mszell

mszell commented Jun 24, 2026

Copy link
Copy Markdown
Contributor

Hi, just adding that I am also very much interested in this, as the overpass api is under heavy strain and fails too much, so this is really a good and needed approach.

If there is any specific functionality I can test, let me know, I'll do my best to help!

Same!

@gboeing

gboeing commented Jun 27, 2026

Copy link
Copy Markdown
Owner Author

Thanks all for the feedback. Yes I'm happy to pick this back again in the coming days. The main issue is designing a good system for filtering. Overpass and PBF need very different filtering logic. Either they're used differently by users, or we need to do major work abstracting things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants