Skip to content

Commit d97ef36

Browse files
authored
Features/202601 (#3)
* progress on FASTEST vs SHORTEST, and stubs * include stubs * docs * ensure build_rtree_index called * nicer api for adding edges, tests * extra test * added splitting * merging ubodt into matcher and removing from python api * finalise not build_rtree_index * tests etc * reverse tolerance fixes * fixing some stuff * some fixes and apportioning time * remove stuff we don't need * tweaks, docs * pytest deps * try that * try that * try that * remove catch2, try fix digest * try this * try that * try that * try that * try that * remove auto-stub-generation as it's annoying with delvewheel on windows * oops * try with tests
1 parent 221a188 commit d97ef36

52 files changed

Lines changed: 3579 additions & 19598 deletions

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

.gitignore

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -99,6 +99,7 @@ sync.sh
9999

100100

101101
cache
102+
.cache
102103
*.whl
103104
dist
104105
wheelhouse/

.vscode/settings.json

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -120,6 +120,8 @@
120120
"setuptools",
121121
"skbuild",
122122
"softprops",
123+
"subtrajectories",
124+
"traj",
123125
"ubodt",
124126
"vcpkg",
125127
"venv"

CMakeLists.txt

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -25,9 +25,11 @@ set(CMAKE_POSITION_INDEPENDENT_CODE ON)
2525

2626
project(fastmm)
2727

28+
set(CMAKE_CXX_STANDARD 17)
29+
set(CMAKE_CXX_STANDARD_REQUIRED ON)
2830

2931
if (MSVC)
30-
add_compile_options(/EHsc)
32+
add_compile_options(/EHsc /std:c++17)
3133
endif()
3234

3335
set(CMAKE_BUILD_TYPE "Release")
@@ -40,7 +42,7 @@ if (MSVC)
4042
else()
4143
set(CMAKE_CXX_FLAGS_RELEASE "${CMAKE_CXX_FLAGS_RELEASE} -O3")
4244
endif()
43-
set(CMAKE_CXX_STANDARD 14)
45+
set(CMAKE_CXX_STANDARD 17)
4446
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY "${CMAKE_CURRENT_SOURCE_DIR}/build")
4547

4648
list(APPEND CMAKE_MODULE_PATH "${CMAKE_SOURCE_DIR}/cmake")

MANIFEST.in

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
include README.md
22
include LICENSE.TXT
33
include pyproject.toml
4-
recursive-include python/fastmm *.py *.pyd *.dll *.so *.dylib
4+
recursive-include python/fastmm *.py *.pyi *.pyd *.dll *.so *.dylib

README.md

Lines changed: 167 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -1,56 +1,195 @@
11
# fastmm
22

3-
fastmm is a fast (C++) map-matching library for python with no dependencies, and the ability to interpolate time on the match, not just position.
3+
fastmm is a fast (C++) map-matching library for python with no dependencies, and the ability to interpolate time on the match (not just position), and also match as much as possible of the GPS trace (not just fail if a single point is wonky).
44

55
It's based on a desire to map match a lot of vehicle trace data quickly, without the infrastructure to spin up OSRM / Valhalla. (And this is probably faster as there's no IPC ... ?)
66

77
It is based on <https://github.com/cyang-kth/fmm> but updated to:
88

9+
- Include Python helper classes for automatic trajectory splitting and time interpolation for the match.
910
- Remove GDAL/OGR dependencies - networks are created programmatically from Python
10-
- Include Python helper classes for automatic trajectory splitting and time interpolation
1111
- Be buildable on Windows/Linux/Mac with modern tooling
1212
- Focus on Python packaging with distributable wheels
1313
- Remove STMatch - we'll focus on FMM for now
14+
- Automated windows, linux, and macOS wheel builds
1415

15-
**Status:**
16-
17-
- [ ] Tested ... = )
18-
- [ ] MapMatcher helper class with auto-splitting and time interpolation
19-
- [x] FASTMM algorithm working
20-
- [x] Python API for network creation and matching
21-
- [x] Windows, linux, and macOS wheel builds
16+
## TODO
2217

18+
- currently if a point from GPS trace fails (too far etc.) it's excluded from match. Should add it to match just with start == end and appropriate error code.
19+
- For reverse_tolerance, why not, if the movement is < reverse tolerance, just assume they're still at the same place (and it's GPS jitter)?
20+
- test the time apportioning.
21+
- If not found in UBODT, instead of bailing, do a normal djikstra lookup.
22+
- max_distance_between_candidates is not a hard limit in UBODT ... I think. Test this, and if needed, add an extra check.
23+
- Specify versions for build libs (e.g. cibuildwheel).
2324

2425
## Installation
2526

2627
```bash
2728
pip install fastmm
2829
```
2930

30-
## TODO
31+
## Quick Start (Recommended)
32+
33+
The simplest way to use fastmm is with the high-level `FastMapMatch` class:
34+
35+
```python
36+
from fastmm import FastMapMatch, MatchErrorCode, Network, Trajectory, TransitionMode, FastMapMatchConfig
37+
38+
# Create and populate network
39+
network = Network()
40+
network.add_edge(1, source=1, target=2, geom=[(0, 0), (100, 0)])
41+
network.add_edge(2, source=2, target=3, geom=[(100, 0), (200, 0)])
42+
network.finalize()
43+
44+
# Create matcher with automatic UBODT caching (SHORTEST mode - distance-based)
45+
matcher = FastMapMatch(
46+
network, TransitionMode.SHORTEST, max_distance_between_candidates=300.0, cache_dir="./cache"
47+
)
48+
49+
# Match a trajectory (automatic splitting)
50+
trajectory = Trajectory.from_xy_tuples(1, [(10, 0), (50, 0), (150, 0)])
51+
result = matcher.match(
52+
trajectory,
53+
max_candidates=8,
54+
candidate_search_radius=50,
55+
gps_error=50
56+
)
57+
58+
# Process successful sub-trajectories
59+
for sub in result.subtrajectories:
60+
if sub.error_code == MatchErrorCode.SUCCESS:
61+
print(f"Matched points {sub.start_index} to {sub.end_index}")
62+
for segment in sub.segments:
63+
print(f" Segment from {segment.p0} to {segment.p1}")
64+
for edge in segment.edges:
65+
print(f" Edge {edge.edge_id} with {len(edge.points)} points")
66+
```
3167

32-
- Bring in extra python code.
33-
- Get test working in python.
34-
- If not found in UBODT, instead of bailing, do a normal djikstra lookup.
35-
- Need to check reverse tolerance - on our edges, they're all directed, so we probably shouldn't allow reversing. This causes errors when we're parsing - if you reverse on the same edge, the geometry gets flipped (I think - line = ALGORITHM::cutoffseg_unique(e0.geom, start_offset, end_offset); goes backward?), which then messes with our python post-processing of associating time as the segment start/stop are now the edge stop/start, not the other way round. We could add a reversed flag to the edge? That would help. For now, just don't have a reverse tolerance.
36-
- Could move the journey splitting (e.g. when unmatched candidate or points too far apart) into the C++ code here. Would be more optimal as a) C++, and b) don't need to repeat candidate lookup etc.
37-
- Improve serialization of UBODT to be cross-platform.
38-
- Specify versions for build libs (e.g. cibuildwheel).
68+
For time-based routing you simply add a speed on all edges, and use `TransitionMode.FASTEST`. Otherwise it's the same.
69+
70+
## Automatic Trajectory Splitting
71+
72+
For trajectories that might have gaps or failures, `match()` automatically filters out troublesome sections, and matches everything it can. That is:
73+
- It ignores points with no nearby road candidates (e.g., in tunnels, off-network) - it just returns the map matched sections either side of the erroneous point. (You can choose to merge them yourself later if you want.)
74+
- If there's a break in the matching due to e.g. a very long distance between two points (data issues, teleportation etc.) then again, it'll return the map matches sections either side of this gap.
75+
76+
## Interpolating time
77+
78+
If your trajectory has timestamps, you often want your resulting match to include time as well i.e. show you at what speed/time your vehicle moved along the matched geometry. This library returns this. If your network doesn't have speed, it just apportions the time along the matched geometry between two GPS linearly. If you have speed, it uses this correctly e.g. if the match segment concludes a 100km/hr edge and a 50km/hr of equal length, it'll apportion less time on the faster edge than on the slower edge.
79+
80+
## Understanding Delta Parameters
81+
82+
The `delta` parameter (called `max_distance_between_candidates` or `max_time_between_candidates` in `MapMatcher`) controls the maximum routing cost for precomputed paths in the UBODT table:
83+
84+
### SHORTEST Mode (Distance-Based)
85+
- **Units**: Same as your network (typically meters)
86+
- **Meaning**: Maximum road network distance between GPS points
87+
- **Recommendation**: 2-3x your expected maximum distance between consecutive GPS points
88+
- **Example**: If GPS points are ~100m apart, use delta=300m
89+
90+
### FASTEST Mode (Time-Based)
91+
- **Units**: Seconds
92+
- **Meaning**: Maximum travel time between GPS points
93+
- **Recommendation**: 2-3x your expected maximum travel time between GPS points
94+
- **Example**: For 200m spacing at 50km/h expected speed: 200m ÷ (50,000m/3600s) ≈ 14.4s → use delta=40s
95+
96+
**Trade-offs**:
97+
- **Larger delta**: Better matching quality (more routing options), but larger file size and slower generation
98+
- **Smaller delta**: Faster generation and smaller files, but may fail to find paths between distant GPS points
99+
100+
## Understanding Reverse Tolerance
101+
102+
The `reverse_tolerance` parameter handles GPS measurement noise that causes slight backward movement on the **same edge**. This is even though we're operating on *directed* edges already - it's to account for a bit of GPS error etc. e.g. if the GPS goes backward. Without this, our routing would work - it'd just end up matching to the other side of the road, which would mean the route would go to the end of the road then back down to the current position (but on the other edge) ... which means if they're stationary and the GPS is jumping, it could look like the vehicle is traveling up and down the street a lot.
103+
104+
### How It Works
105+
106+
**Edge Traversal:** The graph uses **directed edges**. Dijkstra routing always respects edge direction (source → target). For OSM data with bidirectional roads, you should have two edges (one per direction).
107+
108+
**Same-Edge Positioning:** When two consecutive GPS points match to the **same edge** with the second point having a lower offset than the first (backward movement), `reverse_tolerance` controls whether this is allowed:
109+
110+
```python
111+
# Example: GPS noise causes apparent backward movement
112+
GPS Point 1 → Edge 1 at offset=80m (80% along A→B)
113+
GPS Point 2 → Edge 1 at offset=50m (50% along A→B)
114+
115+
# Without reverse_tolerance (0.0):
116+
# - Transition has infinite cost → rejected
117+
# - Algorithm may match Point 2 to opposite-direction edge (creating fake U-turn)
118+
# - Or Point 2 gets skipped in split mode
119+
120+
# With reverse_tolerance=40 (40m in these units):
121+
# - Backward movement = 80m - 50m = 30m
122+
# - 30m < 40m ✅ Allowed with cost=0
123+
```
124+
125+
### The Reversed Flag
126+
127+
When backward movement is allowed (within tolerance), the `reversed` flag indicates this occurred. **The geometry is automatically corrected** to always go forward (from lower to higher offset), so you don't need to handle backward linestrings:
128+
129+
```python
130+
for segment in result.segments:
131+
for edge in segment.edges:
132+
if edge.reversed:
133+
# Geometry has been auto-corrected to go forward
134+
# But you know GPS moved backward on this edge
135+
# May want to flag this for quality control
136+
print(f"Edge {edge.edge_id} had backward GPS movement (now corrected)")
137+
138+
# All edges have forward geometry regardless of reversed flag
139+
# edge.points always go from lower to higher offset
140+
for point in edge.points:
141+
print(f" Offset: {point.edge_offset}, Position: ({point.x}, {point.y})")
142+
```
143+
144+
**What the `reversed` flag means:**
145+
- `reversed=True`: GPS moved backward (offset1 > offset2), geometry was auto-corrected to go forward
146+
- `reversed=False`: GPS moved forward normally (offset1 <= offset2)
147+
148+
**No special handling needed** - the geometry is always correct. Use the flag for:
149+
- Quality control (detecting erratic GPS behavior)
150+
- Statistics (counting backward movements)
151+
- Debugging (understanding matching behavior)
152+
153+
### Recommendations
154+
155+
Start with `reverse_tolerance=0`. If you're seeing a lot of jumping around on stationary (ish) vehicles, either do some pre-filtering, or try e.g. `reverse_tolerance=20m`.
156+
157+
## Routing Modes: SHORTEST vs FASTEST
158+
159+
FastMM supports two routing modes that affect how map matching selects the most likely path.
160+
161+
In both cases, the emission probability is balanced with the transition probability. The emission probability is the likelihood a candidate is on the given edge based simply on how far away it is from the edge - the closer, the higher the probability. This can be controlled with the `gps_error` parameter - make it larger and the emission probability will stay higher even if the point is further away.
39162

40-
### Custom costs
163+
### SHORTEST Mode (Distance-based)
41164

42-
To implement, just update the NetworkGraph construction, and set g[e].length = edge.cost (where you read edge in from the read_ogr_file etc.). The change transition probability to
165+
Uses distance as the routing metric. This is the default mode and matches trajectories based on spatial proximity. The transition probability is:
43166

44167
```
45-
// double tp = TransitionGraph::calculate_transition_probability(shortest_path_distance, euclidean_distance);
46-
double tp = exp(-0.01 * shortest_path_distance); //
168+
tp = min(euclidean_dist, path_dist) / max(euclidean_dist, path_dist)
47169
```
48170

49-
Seems to work:
171+
This compares the straight-line distance between GPS points to the network path distance. Higher probability when the path closely follows the GPS trajectory.
172+
173+
If you find your routes are sticking to the nearest edge, regardless of the feasibility of the route, this is because your `gps_error` is likely too large (?). Likewise the converse.
174+
175+
### FASTEST Mode (Time-based)
176+
177+
Uses travel time as the routing metric. Requires speed values on all edges. The transition probability is
178+
179+
```
180+
expected_time = euclidean_dist / reference_speed
181+
actual_time = path_time (sum of segment_length / segment_speed)
182+
tp = min(expected_time, actual_time) / max(expected_time, actual_time)
183+
```
184+
185+
Similarly to above, this gives a higher priority when the travel time is the same, or faster than the expected travel time (which is the euclidean distance divided by the reference speed). If you're finding your routes are sticking to the nearest edge, regardless of the feasibility of the route, either decrease `gps_error` as above, or decrease (?) the reference speed.
186+
187+
## Developing
188+
189+
You can create stubs with
190+
191+
```
192+
python .\generate_stubs_for_wheel.py .\python\fastmm\ .\python\fastmm\
193+
```
50194

51-
- set cost to 1 and it minimizes the number of edges
52-
- set cost to edge length, and gives similar result (as both the methods minimise distance - only differences is the previous method is using euclidean distance between candidates maybe, not matched points? would only matter with dense points).
53-
- can prevent some edges being used by manually bumping up their cost (tested on one road by cost *= 100 for those edges - worked, it avoided those edges).
54-
- Untested:
55-
- time based: if we have speed on all edges, set cost = distance / speed.
56-
- use road hierachy or similar - prioritise main highways. More useful for addinsight.
195+
For now, this is better than doing it in a CI/CD pipeline as Windows is painful.

example_complete.py

Lines changed: 0 additions & 105 deletions
This file was deleted.

0 commit comments

Comments
 (0)