|
1 | 1 | # fastmm |
2 | 2 |
|
3 | | -fastmm is a fast (C++) map-matching library for python with no dependencies, and the ability to interpolate time on the match, not just position. |
| 3 | +fastmm is a fast (C++) map-matching library for python with no dependencies, and the ability to interpolate time on the match (not just position), and also match as much as possible of the GPS trace (not just fail if a single point is wonky). |
4 | 4 |
|
5 | 5 | It's based on a desire to map match a lot of vehicle trace data quickly, without the infrastructure to spin up OSRM / Valhalla. (And this is probably faster as there's no IPC ... ?) |
6 | 6 |
|
7 | 7 | It is based on <https://github.com/cyang-kth/fmm> but updated to: |
8 | 8 |
|
| 9 | +- Include Python helper classes for automatic trajectory splitting and time interpolation for the match. |
9 | 10 | - Remove GDAL/OGR dependencies - networks are created programmatically from Python |
10 | | -- Include Python helper classes for automatic trajectory splitting and time interpolation |
11 | 11 | - Be buildable on Windows/Linux/Mac with modern tooling |
12 | 12 | - Focus on Python packaging with distributable wheels |
13 | 13 | - Remove STMatch - we'll focus on FMM for now |
| 14 | +- Automated windows, linux, and macOS wheel builds |
14 | 15 |
|
15 | | -**Status:** |
16 | | - |
17 | | -- [ ] Tested ... = ) |
18 | | -- [ ] MapMatcher helper class with auto-splitting and time interpolation |
19 | | -- [x] FASTMM algorithm working |
20 | | -- [x] Python API for network creation and matching |
21 | | -- [x] Windows, linux, and macOS wheel builds |
| 16 | +## TODO |
22 | 17 |
|
| 18 | +- currently if a point from GPS trace fails (too far etc.) it's excluded from match. Should add it to match just with start == end and appropriate error code. |
| 19 | +- For reverse_tolerance, why not, if the movement is < reverse tolerance, just assume they're still at the same place (and it's GPS jitter)? |
| 20 | +- test the time apportioning. |
| 21 | +- If not found in UBODT, instead of bailing, do a normal djikstra lookup. |
| 22 | +- max_distance_between_candidates is not a hard limit in UBODT ... I think. Test this, and if needed, add an extra check. |
| 23 | +- Specify versions for build libs (e.g. cibuildwheel). |
23 | 24 |
|
24 | 25 | ## Installation |
25 | 26 |
|
26 | 27 | ```bash |
27 | 28 | pip install fastmm |
28 | 29 | ``` |
29 | 30 |
|
30 | | -## TODO |
| 31 | +## Quick Start (Recommended) |
| 32 | + |
| 33 | +The simplest way to use fastmm is with the high-level `FastMapMatch` class: |
| 34 | + |
| 35 | +```python |
| 36 | +from fastmm import FastMapMatch, MatchErrorCode, Network, Trajectory, TransitionMode, FastMapMatchConfig |
| 37 | + |
| 38 | +# Create and populate network |
| 39 | +network = Network() |
| 40 | +network.add_edge(1, source=1, target=2, geom=[(0, 0), (100, 0)]) |
| 41 | +network.add_edge(2, source=2, target=3, geom=[(100, 0), (200, 0)]) |
| 42 | +network.finalize() |
| 43 | + |
| 44 | +# Create matcher with automatic UBODT caching (SHORTEST mode - distance-based) |
| 45 | +matcher = FastMapMatch( |
| 46 | + network, TransitionMode.SHORTEST, max_distance_between_candidates=300.0, cache_dir="./cache" |
| 47 | +) |
| 48 | + |
| 49 | +# Match a trajectory (automatic splitting) |
| 50 | +trajectory = Trajectory.from_xy_tuples(1, [(10, 0), (50, 0), (150, 0)]) |
| 51 | +result = matcher.match( |
| 52 | + trajectory, |
| 53 | + max_candidates=8, |
| 54 | + candidate_search_radius=50, |
| 55 | + gps_error=50 |
| 56 | +) |
| 57 | + |
| 58 | +# Process successful sub-trajectories |
| 59 | +for sub in result.subtrajectories: |
| 60 | + if sub.error_code == MatchErrorCode.SUCCESS: |
| 61 | + print(f"Matched points {sub.start_index} to {sub.end_index}") |
| 62 | + for segment in sub.segments: |
| 63 | + print(f" Segment from {segment.p0} to {segment.p1}") |
| 64 | + for edge in segment.edges: |
| 65 | + print(f" Edge {edge.edge_id} with {len(edge.points)} points") |
| 66 | +``` |
31 | 67 |
|
32 | | -- Bring in extra python code. |
33 | | -- Get test working in python. |
34 | | -- If not found in UBODT, instead of bailing, do a normal djikstra lookup. |
35 | | -- Need to check reverse tolerance - on our edges, they're all directed, so we probably shouldn't allow reversing. This causes errors when we're parsing - if you reverse on the same edge, the geometry gets flipped (I think - line = ALGORITHM::cutoffseg_unique(e0.geom, start_offset, end_offset); goes backward?), which then messes with our python post-processing of associating time as the segment start/stop are now the edge stop/start, not the other way round. We could add a reversed flag to the edge? That would help. For now, just don't have a reverse tolerance. |
36 | | -- Could move the journey splitting (e.g. when unmatched candidate or points too far apart) into the C++ code here. Would be more optimal as a) C++, and b) don't need to repeat candidate lookup etc. |
37 | | -- Improve serialization of UBODT to be cross-platform. |
38 | | -- Specify versions for build libs (e.g. cibuildwheel). |
| 68 | +For time-based routing you simply add a speed on all edges, and use `TransitionMode.FASTEST`. Otherwise it's the same. |
| 69 | + |
| 70 | +## Automatic Trajectory Splitting |
| 71 | + |
| 72 | +For trajectories that might have gaps or failures, `match()` automatically filters out troublesome sections, and matches everything it can. That is: |
| 73 | +- It ignores points with no nearby road candidates (e.g., in tunnels, off-network) - it just returns the map matched sections either side of the erroneous point. (You can choose to merge them yourself later if you want.) |
| 74 | +- If there's a break in the matching due to e.g. a very long distance between two points (data issues, teleportation etc.) then again, it'll return the map matches sections either side of this gap. |
| 75 | + |
| 76 | +## Interpolating time |
| 77 | + |
| 78 | +If your trajectory has timestamps, you often want your resulting match to include time as well i.e. show you at what speed/time your vehicle moved along the matched geometry. This library returns this. If your network doesn't have speed, it just apportions the time along the matched geometry between two GPS linearly. If you have speed, it uses this correctly e.g. if the match segment concludes a 100km/hr edge and a 50km/hr of equal length, it'll apportion less time on the faster edge than on the slower edge. |
| 79 | + |
| 80 | +## Understanding Delta Parameters |
| 81 | + |
| 82 | +The `delta` parameter (called `max_distance_between_candidates` or `max_time_between_candidates` in `MapMatcher`) controls the maximum routing cost for precomputed paths in the UBODT table: |
| 83 | + |
| 84 | +### SHORTEST Mode (Distance-Based) |
| 85 | +- **Units**: Same as your network (typically meters) |
| 86 | +- **Meaning**: Maximum road network distance between GPS points |
| 87 | +- **Recommendation**: 2-3x your expected maximum distance between consecutive GPS points |
| 88 | +- **Example**: If GPS points are ~100m apart, use delta=300m |
| 89 | + |
| 90 | +### FASTEST Mode (Time-Based) |
| 91 | +- **Units**: Seconds |
| 92 | +- **Meaning**: Maximum travel time between GPS points |
| 93 | +- **Recommendation**: 2-3x your expected maximum travel time between GPS points |
| 94 | +- **Example**: For 200m spacing at 50km/h expected speed: 200m ÷ (50,000m/3600s) ≈ 14.4s → use delta=40s |
| 95 | + |
| 96 | +**Trade-offs**: |
| 97 | +- **Larger delta**: Better matching quality (more routing options), but larger file size and slower generation |
| 98 | +- **Smaller delta**: Faster generation and smaller files, but may fail to find paths between distant GPS points |
| 99 | + |
| 100 | +## Understanding Reverse Tolerance |
| 101 | + |
| 102 | +The `reverse_tolerance` parameter handles GPS measurement noise that causes slight backward movement on the **same edge**. This is even though we're operating on *directed* edges already - it's to account for a bit of GPS error etc. e.g. if the GPS goes backward. Without this, our routing would work - it'd just end up matching to the other side of the road, which would mean the route would go to the end of the road then back down to the current position (but on the other edge) ... which means if they're stationary and the GPS is jumping, it could look like the vehicle is traveling up and down the street a lot. |
| 103 | + |
| 104 | +### How It Works |
| 105 | + |
| 106 | +**Edge Traversal:** The graph uses **directed edges**. Dijkstra routing always respects edge direction (source → target). For OSM data with bidirectional roads, you should have two edges (one per direction). |
| 107 | + |
| 108 | +**Same-Edge Positioning:** When two consecutive GPS points match to the **same edge** with the second point having a lower offset than the first (backward movement), `reverse_tolerance` controls whether this is allowed: |
| 109 | + |
| 110 | +```python |
| 111 | +# Example: GPS noise causes apparent backward movement |
| 112 | +GPS Point 1 → Edge 1 at offset=80m (80% along A→B) |
| 113 | +GPS Point 2 → Edge 1 at offset=50m (50% along A→B) |
| 114 | + |
| 115 | +# Without reverse_tolerance (0.0): |
| 116 | +# - Transition has infinite cost → rejected |
| 117 | +# - Algorithm may match Point 2 to opposite-direction edge (creating fake U-turn) |
| 118 | +# - Or Point 2 gets skipped in split mode |
| 119 | + |
| 120 | +# With reverse_tolerance=40 (40m in these units): |
| 121 | +# - Backward movement = 80m - 50m = 30m |
| 122 | +# - 30m < 40m ✅ Allowed with cost=0 |
| 123 | +``` |
| 124 | + |
| 125 | +### The Reversed Flag |
| 126 | + |
| 127 | +When backward movement is allowed (within tolerance), the `reversed` flag indicates this occurred. **The geometry is automatically corrected** to always go forward (from lower to higher offset), so you don't need to handle backward linestrings: |
| 128 | + |
| 129 | +```python |
| 130 | +for segment in result.segments: |
| 131 | + for edge in segment.edges: |
| 132 | + if edge.reversed: |
| 133 | + # Geometry has been auto-corrected to go forward |
| 134 | + # But you know GPS moved backward on this edge |
| 135 | + # May want to flag this for quality control |
| 136 | + print(f"Edge {edge.edge_id} had backward GPS movement (now corrected)") |
| 137 | + |
| 138 | + # All edges have forward geometry regardless of reversed flag |
| 139 | + # edge.points always go from lower to higher offset |
| 140 | + for point in edge.points: |
| 141 | + print(f" Offset: {point.edge_offset}, Position: ({point.x}, {point.y})") |
| 142 | +``` |
| 143 | + |
| 144 | +**What the `reversed` flag means:** |
| 145 | +- `reversed=True`: GPS moved backward (offset1 > offset2), geometry was auto-corrected to go forward |
| 146 | +- `reversed=False`: GPS moved forward normally (offset1 <= offset2) |
| 147 | + |
| 148 | +**No special handling needed** - the geometry is always correct. Use the flag for: |
| 149 | +- Quality control (detecting erratic GPS behavior) |
| 150 | +- Statistics (counting backward movements) |
| 151 | +- Debugging (understanding matching behavior) |
| 152 | + |
| 153 | +### Recommendations |
| 154 | + |
| 155 | +Start with `reverse_tolerance=0`. If you're seeing a lot of jumping around on stationary (ish) vehicles, either do some pre-filtering, or try e.g. `reverse_tolerance=20m`. |
| 156 | + |
| 157 | +## Routing Modes: SHORTEST vs FASTEST |
| 158 | + |
| 159 | +FastMM supports two routing modes that affect how map matching selects the most likely path. |
| 160 | + |
| 161 | +In both cases, the emission probability is balanced with the transition probability. The emission probability is the likelihood a candidate is on the given edge based simply on how far away it is from the edge - the closer, the higher the probability. This can be controlled with the `gps_error` parameter - make it larger and the emission probability will stay higher even if the point is further away. |
39 | 162 |
|
40 | | -### Custom costs |
| 163 | +### SHORTEST Mode (Distance-based) |
41 | 164 |
|
42 | | -To implement, just update the NetworkGraph construction, and set g[e].length = edge.cost (where you read edge in from the read_ogr_file etc.). The change transition probability to |
| 165 | +Uses distance as the routing metric. This is the default mode and matches trajectories based on spatial proximity. The transition probability is: |
43 | 166 |
|
44 | 167 | ``` |
45 | | -// double tp = TransitionGraph::calculate_transition_probability(shortest_path_distance, euclidean_distance); |
46 | | -double tp = exp(-0.01 * shortest_path_distance); // |
| 168 | +tp = min(euclidean_dist, path_dist) / max(euclidean_dist, path_dist) |
47 | 169 | ``` |
48 | 170 |
|
49 | | - Seems to work: |
| 171 | +This compares the straight-line distance between GPS points to the network path distance. Higher probability when the path closely follows the GPS trajectory. |
| 172 | + |
| 173 | +If you find your routes are sticking to the nearest edge, regardless of the feasibility of the route, this is because your `gps_error` is likely too large (?). Likewise the converse. |
| 174 | + |
| 175 | +### FASTEST Mode (Time-based) |
| 176 | + |
| 177 | +Uses travel time as the routing metric. Requires speed values on all edges. The transition probability is |
| 178 | + |
| 179 | +``` |
| 180 | +expected_time = euclidean_dist / reference_speed |
| 181 | +actual_time = path_time (sum of segment_length / segment_speed) |
| 182 | +tp = min(expected_time, actual_time) / max(expected_time, actual_time) |
| 183 | +``` |
| 184 | + |
| 185 | +Similarly to above, this gives a higher priority when the travel time is the same, or faster than the expected travel time (which is the euclidean distance divided by the reference speed). If you're finding your routes are sticking to the nearest edge, regardless of the feasibility of the route, either decrease `gps_error` as above, or decrease (?) the reference speed. |
| 186 | + |
| 187 | +## Developing |
| 188 | + |
| 189 | +You can create stubs with |
| 190 | + |
| 191 | +``` |
| 192 | +python .\generate_stubs_for_wheel.py .\python\fastmm\ .\python\fastmm\ |
| 193 | +``` |
50 | 194 |
|
51 | | -- set cost to 1 and it minimizes the number of edges |
52 | | -- set cost to edge length, and gives similar result (as both the methods minimise distance - only differences is the previous method is using euclidean distance between candidates maybe, not matched points? would only matter with dense points). |
53 | | -- can prevent some edges being used by manually bumping up their cost (tested on one road by cost *= 100 for those edges - worked, it avoided those edges). |
54 | | -- Untested: |
55 | | - - time based: if we have speed on all edges, set cost = distance / speed. |
56 | | - - use road hierachy or similar - prioritise main highways. More useful for addinsight. |
| 195 | +For now, this is better than doing it in a CI/CD pipeline as Windows is painful. |
0 commit comments