Remove stops if the stop or its parent station is not in use by a trip pattern #3588
  Add this suggestion to a batch that can be applied as a single commit.
  This suggestion is invalid because no changes were made to the code.
  Suggestions cannot be applied while the pull request is closed.
  Suggestions cannot be applied while viewing a subset of changes.
  Only one suggestion per line can be applied in a batch.
  Add this suggestion to a batch that can be applied as a single commit.
  Applying suggestions on deleted lines is not supported.
  You must change the existing code in this line in order to create a valid suggestion.
  Outdated suggestions cannot be applied.
  This suggestion has been applied or marked resolved.
  Suggestions cannot be applied from pending reviews.
  Suggestions cannot be applied on multi-line comments.
  Suggestions cannot be applied while the pull request is queued to merge.
  Suggestion cannot be applied right now. Please check back later.
  
    
  
    
Summary
This feature remove all unused Stops and cascade to all entries referencing these. This happens after a GTFS or NeTEx feed is read, but before the stops are put in the OTP internal model and the graph is build. By "unused" we mean that the Stop or its parent Station is not visited by a trip pattern. So, if Stop A, B, and C is part of the same Station, and one trip pattern stop at A, then all stops A, B, and C are kept.
Typical use case
As a developer I need to debug a routing request, but the size of the data is overwhelming. Editing NeTEx (or GTFS) files is time consuming, difficult and error prune. So, it would be nice if we had a good way to limit the data loaded. There are tools for chopping down GTFS, but I am not aware of any good tools for NeTEx data set. It would be nice if OTP could clean up and remove data that can not be used in a travel seach. Then all I have to do as a developer is to remove trip from the dataset, and OTP will remove all other data that is not in use by the provided trips.
Solution
To make it easier we kan remove all trips from the feed except the trips involved and run OTP. Removing trips is fairly easy, even in NeTEx xml files. Then this PR provide a feature for filtering down the stops, station, pathways, transfers, multi-modal-stops and group-of-stop-places. To enable the feature add the following to the otp-config.json file:
{ "RemoveUnusedStops" : true }A potential problem is that a real-time updates try to use a deleted stop or one of the other deleted entities. This feature is not ment to be used in production.
This improve the graph build time. I tested this with the Norwegian NeTEx data set. I deleted all trips, except 2. The graph build time went down from 5.7 minutes to 1.5 minute. From the log we can see that almost all stops are removed:
106006 of 106164 Stop(s) removedIssue
No issue exist.
Unit tests
No unit test exist, but I have tested this during debugging of another problem where limiting the number of stops is important.
Documentation
The configuration documentation is updated.
Changelog
No change log is added. Not a production feature.