Description
A customer has described a process for using R5 for what is typically called "selected link analysis" in traffic models, in ways related to the assignment phase of such models.
The process is currently somewhat convoluted, as R5's internal information about paths between specific origins and destinations is not broken out by link. One can export paths between a set of origins and destinations, but there is one full end-to-end path on each line of CSV. These full paths must be post-processed to find relevant links. Additional workarounds are necessary to pre-select origin-destination pairs for which some paths pass through the links of interest.
The customer does not mind scaling origin-destination flows to reflect future scenarios, then assigning those differences proportionately to links in manual post-processing. But to do this assignment step efficiently, they need some indication of how flows are spread over links within the geographic areas of interest. So the simplest change enabling this kind of analysis is to break out the link-level information on separate CSV lines.
Doing this for all origins, destinations, and links would yield O*D*L
rows of CSV which gets very large very fast. In addition, for most use cases that output would need to be immediately filtered down to a small number of links and a subset of origin-destination pairs. The intermediate huge table is not needed, and the filtering would introduce extra manual steps. Instead, all filtering could be performed in a streaming manner within R5 itself.
Selecting a small set of links by unique ID is not trivial, because network modifications in various scenarios can introduce new links along a road of interest. So it is expected that selection by geographic bounding box/polygon will give a smoother workflow over multiple scenarios.
The customer does not mind summing or otherwise manipulating final values for a number of different links, as long as they have been filtered down to only the links in the area of interest and are well labeled. If multiple bus routes from GTFS and scenarios all pass along a particular road, it is acceptable to report them separately as just routes, without attempting to use heuristics to sum traffic along each distinct road segment. So the proposed solution is to create CSV output where lines are (origin, destination, route_name, proportion). Note that some proportion of the "iterations" for a particular origin-destination pair may pass outside the selected area, so if the proportion field is not a raw count, they should be proportions out of the total number of iterations per OD, not out of the number of iterations passing through this area. The lines of output may be filtered down to only the origin-destination pairs where some proportion of the paths pass through the selected area, with the understanding that all other pairs have values of zero in the proportion column.
One problem that arose in the current roundabout implementation of this selected link analysis is that routes introduced by scenario modifications are identified in the CSV output with a random UUID, and although the user-specified name of the modification is assigned to a field of the generated route object, it is currently not possible to get that name into the output CSV without patching the source code (see https://github.com/conveyal/r5/tree/path-route-names). Any new output CSV should probably include the route short name when identifying network links, so that the names of modifications are visible in the output instead of random UUIDs.