This document explains the shared parts among all the converters when converting a data set from a given format into a NTFS dataset.
The construction of NTFS objects IDs requires, for uniqueness purpose, that a unique prefix (specified for each source of data as an additional parameter to each converter) needs to be included in every object's id.
Prepending all the identifiers with a unique prefix ensures that the NTFS identifiers are unique accross all the NTFS datasets. With this assumption, merging two NTFS datasets can be done without worrying about conflicting identifiers.
This prefix should be applied to all NTFS identifiers except for the physical mode identifiers that are standardized and fixed values. Fixed values are described in the NTFS specifications
To reinforce the uniqueness some objects might have a sub-prefix in addition to their prefix.
The pattern is the following <prefix>:<sub_prefix>:<object_id>.
Adding a sub-prefix allows the merge of seasonal datasets; similar referential (e.g. networks, lines, stop areas, stop points) but different schedules (e.g. trips, dates).
The objects that may be concerned by this sub-prefix are: calendars, trips, trip_properties, frequencies, comments, comment_links, geometries, equipments (see each connector's documentation for details).
A configuration file config.json, as it is shown below, is provided for each
converter and contains additional information about the data source as well as about
the upstream system that generated the data (if available). In particular, it provides the necessary information for:
- the required NTFS files
contributors.txtanddatasets.txt - some additional metadata can also be inserted in the file
feed_infos.txt.
{
"contributor": {
"contributor_id": "DefaultContributorId",
"contributor_name": "DefaultContributorName",
"contributor_license": "DefaultDatasourceLicense",
"contributor_website": "http://www.default-datasource-website.com"
},
"dataset": {
"dataset_id": "DefaultDatasetId"
},
"feed_infos": {
"feed_publisher_name": "DefaultContributorName",
"feed_license": "DefaultDatasourceLicense",
"feed_license_url": "http://www.default-datasource-website.com",
}
}The objects contributor and dataset are required, containing at least the
corresponding identifier (and the name for contributor), otherwise the conversion
stops with an error. The object feed_infos is optional.
The files contributors.txt and datasets.txt provide additional information about the data source.
| NTFS file | NTFS field | key in config.json |
Constraint | Note |
|---|---|---|---|---|
| contributors.txt | contributor_id | contributor_id | Required | This field is prefixed. |
| contributors.txt | contributor_name | contributor_name | Required | |
| contributors.txt | contributor_license | contributor_license | Optional | |
| contributors.txt | contributor_website | contributor_website | Optional |
| NTFS file | NTFS field | key in config.json |
Constraint | Note |
|---|---|---|---|---|
| datasets.txt | dataset_id | dataset_id | Required | This field is prefixed. |
| datasets.txt | contributor_id | contributor_id | Required | This field is prefixed. |
| datasets.txt | dataset_start_date | Smallest date of all the trips of the dataset. | ||
| datasets.txt | dataset_end_date | Greatest date of all the trips of the dataset. |
Physical modes may not contain CO2 emissions. If the value is missing, we are using default values (see below), mostly based on what is provided by ADEME.
| Physical Mode | CO2 emission (gCO2-eq/km) |
|---|---|
| Air | 144.6 |
| Boat | NC |
| Bus | 132 |
| BusRapidTransit | 84 |
| Coach | 171 |
| Ferry | 279 |
| Funicular | 3 |
| LocalTrain | 30.7 |
| LongDistanceTrain | 3.4 |
| Metro | 3 |
| RapidTransit | 6.2 |
| RailShuttle | NC |
| Shuttle | NC |
| SuspendedCableCar | NC |
| Taxi | 184 |
| Train | 11.9 |
| Tramway | 4 |
The following fallback modes are also added to the model (they're usually not referenced by any trip).
| Physical Mode | CO2 emission (gCO2-eq/km) |
|---|---|
| Bike | 0 |
| BikeSharingService | 0 |
| Car | 184 |
The following rules apply to every converter, unless otherwise explicitly specified.
- When one or more stop_points in the input data are not attached to a
stop_area, a stop_area is automatically created for each one. The name, the
coordinates, the visibility, and the timezone of the new
stop_areaare the same as the corresponding stop_point, the identifier is thestop_point's identifier prefixed withNavitia:. - If a
stop_areadoesn't have coordinates, the barycenter of the containedstop_pointsis used. - Unless otherwise specified, dates of service are transformed into a list of active dates as if using a single NTFS file
calendar_dates.txt. Those list of dates are then transformed tocalendarandcalendar_datesautomatically. - Any
/character in an identifier of an object is removed. - If a trip doesn't have a
trip_headsign, it is automatically generated based on the name of the last stop point of the trip - If a route doesn't have a
direction_type(or empty), thedirection_type"forward" is assigned by default - If a route doesn't have a name (or empty),
nameanddestination_idare automatically generated:- the
route.nameis generated with the following rules:- select the most frequent
stop_areaorigin and most frequentstop_areadestination of all the associated trips - in case of equal frequencies, the biggest
stop_areas (the moststop_points) are chosen - in case of
stop_areaof equal sizes, thestop_areanames are sorted alphabetically and the first ones are taken - finally, the
route.nameis generated with:[name of origin's stop area] - [name of destination's stop area]
- select the most frequent
- the
route.destination_idis set (overridden if needed) with the destination's stop area selected with the above rule
- the
- If a line has an empty opening or closing times, then they are both generated.
- the
line.opening_timeis generated with the smallest departure time (at the first stop) of all journeys on the lines. - the
line.closing_timeis generated with the biggest arrival time (at the last stop) of all journeys on the lines (+ 24h if the end is earlier than the start time). - if a line has several periods without circulation in the day, only the main one (larger and earlier) is used to define the opening and closing times.
- lines with continuous circulation are indicated by default with an opening at 00:00 and a closing at 23:59.
- the
The model will raise a critical error if identifiers of 2 objects of the same type are identical. For example:
- if 2 datasets have the same identifier
- if 2 lines have the same identifier
- if 2 stop_points have the same identifier
- if 2 stop_areas have the same identifier
- if 2 routes have the same identifier
- if 2 trips have the same identifier
Please note that a stop_area and a stop_point can have the same identifier because they are considered as different types of objects.
Dangling references are cleaned up:
- if a transfer refers a stop which doesn't exist (
from_stop_idandto_stop_id) - if a trip refers to a route which doesn't exist
- if a trip refers to a commercial mode which doesn't exist
- if a trip refers to a dataset which doesn't exist
- if a trip refers to a company which doesn't exist
- if a trip refers to a calendar which doesn't exist
- if a line refers to a network which doesn't exist
- if a line refers to a commercial mode which doesn't exist
- if a route refers to a line which doesn't exist
- if a stop_point refers to a stop_area which doesn't exist
- if a dataset refers to a contributor which doesn't exist
Objects that are not relevant are cleaned up:
datasetswhich are not referenced bytripscontributorswhich are not referenced bydatasetscompanieswhich are not referenced bytripsnetworkscontaining nolinelinescontaining norouteroutescontaining notripstripscontaining nostop_timeor with emptycalendarsstop_pointswhich are not referenced bystop_timesstop_areaswhich are not referenced bystop_pointsorroutescalendarswhich doesn't contain any active dategeometrieswhich are not referencedequipmentswhich are not referenced bystop_pointsfrequencieswhich are not referenced bytripsphysical_modeswhich are not referenced bytripscommercial_modeswhich are not referenced bylinestrip_propertieswhich are not referenced bytripscommentswhich are not referencedgrid_calendarwhich refers to alinewhich does not exist (through the relation in the filegrid_rel_calendar_line.txt); Exception: when theline_external_codeis used and theline_idis empty, thegrid_calendaris keptgrid_exceptiondate which refers to agrid_calendarwhich does not existgrid_periodwhich refers to agrid_calendarwhich does not exist