Skip to content

Commit ea981cb

Browse files
authored
update readme to reflect DMFR v0.5.1 and add more context (#13)
1 parent d5741b0 commit ea981cb

File tree

1 file changed

+171
-69
lines changed

1 file changed

+171
-69
lines changed

README.md

Lines changed: 171 additions & 69 deletions
Original file line numberDiff line numberDiff line change
@@ -8,26 +8,29 @@
88
- [Basic Examples](#basic-examples)
99
- [Fields](#fields)
1010
- [IDs](#ids)
11+
- [Feed specs and URLs](#feed-specs-and-urls)
1112
- [Extended URLs](#extended-urls)
12-
- [Optional Stanzas](#optional-stanzas)
13+
- [Optional Stanzas and Tags](#optional-stanzas-and-tags)
1314
- [License](#license)
1415
- [Authentication](#authentication)
1516
- [Tags](#tags)
17+
- [Languages](#languages)
18+
- [Superceding lineage](#superceding-lineage)
1619

1720
## Introduction
1821

1922
This is a set of guidelines for data publishers providing machine readable lists of their feeds _and_ for data aggregation platforms providing machine readable lists of their feed contents to each other. This project is rooted in publishing and sharing lists of GTFS feeds for fixed-route public-transit networks. It's also applicable to real-time transit, bike-share, e-scooter, and other mobility datasets that take the form of "feeds" published at stable URLs:
2023

2124
- [GTFS](https://gtfs.org/reference/static/)
22-
- [GTFS-Realtime](https://gtfs.org/reference/realtime/v2/)
23-
- [GBFS](https://github.com/NABSA/gbfs/#readme)
25+
- [GTFS Realtime](https://gtfs.org/documentation/realtime/reference/)
26+
- [GBFS](https://github.com/MobilityData/gbfs#readme)
2427
- [MDS](https://github.com/CityOfLosAngeles/mobility-data-specification/#readme)
2528

2629
## Goals
2730

2831
1. **Publishers provide their own small registries** To provide data creators (e.g., transit agencies and data vendors) a means of posting a list of their public feeds online. The format should be light-weight (no server required to power an API). The registry should also be machine readable, making it simple for data aggregation platforms to automatically recognize and consume newly added feeds.
2932
2. **Aggregator platforms share their registries** To provide data aggregation platforms (e.g., Transitland, OpenMobilityData, Navitia) a means of sharing their feed registries with each other. Each platform may have a particular focus in terms of functionality provided on top of their feed registries. By distributing feed lists among any and all platforms, open data is shared my widely and the burden of data curation is (hopefully) reduced for each platform.
30-
3. **Related feeds are linked** Different feed types reference each other (e.g., GTFS-realtime references a static GTFS feed, an MDS e-scooter feed references a GBFS bike-share feed). This registry format will provide a light-weight means for data publishers and aggregator platforms to identify these linkages.
33+
3. **Related feeds are linked** Different feed types reference each other (e.g., GTFS Realtime references a static GTFS feed, an MDS e-scooter feed references a GBFS bike-share feed). This registry format will provide a light-weight means for data publishers and aggregator platforms to identify these linkages.
3134
4. **Put it into practice and experiment**
3235
- ~~The more contributors to these guidelines, the better! Let's consider many options and discuss the pros/cons of each of the registry specifics. Let's also be pragmatic. Our goal at Transitland will be to implement this registry format for both incoming feed submissions (to complement the existing Transitland Feed Registry [add a feed](https://transit.land/documentation/feed-registry/add-a-feed.html) process) and outputting lists of known feeds (the Datastore API [feeds endpoint](https://transit.land/documentation/datastore/feeds.html)).~~
3336
- DMFR now powers the new [Transitland Atlas](https://github.com/transitland/transitland-atlas), which is the source of truth for both Transitland v1 and [Transitland v2](https://transit.land/news/2019/10/17/tlv2.html)'s Feed Registry.
@@ -41,91 +44,59 @@ The stands on the shoulders of:
4144
- Transitland Feed Registry v1 https://github.com/transitland/transitland-feed-registry
4245
- Transitland Feed Registry v2 http://transit.land/feed-registry/
4346
- MDS `providers.csv` https://github.com/CityOfLosAngeles/mobility-data-specification/blob/dev/providers.csv
44-
- GBFS `systems.csv` https://github.com/NABSA/gbfs/blob/master/systems.csv
47+
- GBFS `systems.csv` https://github.com/mobilitydata/gbfs/blob/master/systems.csv
4548

4649
## Basic Examples
4750

4851
Single static GTFS feed:
4952

5053
```jsonc
5154
{
55+
"$schema": "https://dmfr.transit.land/json-schema/dmfr.schema-v0.5.1.json",
5256
"feeds": [
5357
{
5458
"spec": "gtfs", // enum: ["gtfs", "gtfs-rt", "gbfs", "mds"]
55-
"id": "XXXX", // IDs are internally unique, but not necessarily globally unique
56-
"urls": { // "Transitland style URL" to support nested zip archives
57-
"static_current": "",
58-
"static_historic": [""],
59-
"static_planned": [""]
60-
},
61-
"languages": ["en-US"], // IETF language tags, see https://tools.ietf.org/html/bcp47
62-
"license": { // license covering the contents of the feed
63-
"spdx_identifier": "", // see https://spdx.org/licenses/
64-
"url": "",
65-
"use_without_attribution": "yes", // enum: ["yes", "no", "unknown"]
66-
"create_derived_product": "yes", // enum: ["yes", "no", "unknown"]
67-
"redistribute": "yes", // enum: ["yes", "no", "unknown"]
68-
"attribution_text": "",
59+
"id": "example", // ID must be unique across all of your DMFR files; in the Transitland Atlas repo, this is a feed Onestop ID
60+
"urls": {
61+
"static_current": "http://example.com/gtfs.zip" // URL for the current version of the feed
6962
}
7063
}
7164
],
72-
"license_spdx_identifier": "CC0-1.0" // license covering the DMFR file itself; see https://spdx.org/licenses/
65+
"license_spdx_identifier": "CDLA-Permissive-1.0" // license covering the DMFR file itself, not the feed contents
7366
}
7467
```
7568

76-
Single GTFS-realtime feed:
69+
Single GTFS Realtime feed:
7770

7871
```jsonc
7972
{
73+
"$schema": "https://dmfr.transit.land/json-schema/dmfr.schema-v0.5.1.json",
8074
"feeds": [
8175
{
82-
"type": "gtfs-rt", // enum: ["gtfs", "gtfs-rt", "gbfs", "mds"]
83-
"id": "XXXX", // unique ID for this feed record; may be a Onestop ID or your own ID scheme
76+
"spec": "gtfs-rt", // enum: ["gtfs", "gtfs-rt", "gbfs", "mds"]
77+
"id": "XXXX", // unique ID for this feed record; in the Transitland Atlas repo, this is a feed Onestop ID often ending in ~rt
8478
"urls": {
8579
"realtime_vehicle_positions": "",
8680
"realtime_trip_updates": "",
8781
"realtime_alerts": ""
88-
},
89-
"languages": ["en-US"], // IETF language tags, see https://tools.ietf.org/html/bcp47
90-
"license": {
91-
"spdx_identifier": "", // see https://spdx.org/licenses/
92-
"url": "",
93-
"use_without_attribution": "yes", // enum: ["yes", "no", "unknown"]
94-
"create_derived_product": "yes", // enum: ["yes", "no", "unknown"]
95-
"redistribute": "yes", // enum: ["yes", "no", "unknown"]
96-
"attribution_text": "",
9782
}
98-
},
99-
{
100-
"type": "gtfs", // enum: ["gtfs", "gtfs-rt", "gbfs", "mds"],
101-
"id": "XXXX", // unique ID for this feed record; may be a Onestop ID or your own ID scheme
102-
// ...
10383
}
10484
],
105-
"license_spdx_identifier": "CC0-1.0" // required to meet this spec
85+
"license_spdx_identifier": "CDLA-Permissive-1.0" // required to meet this spec
10686
}
10787
```
10888

10989
Group together multiple feeds using an operator:
11090

11191
```jsonc
11292
{
113-
"$schema": "https://dmfr.transit.land/json-schema/dmfr.schema-v0.3.0.json",
93+
"$schema": "https://dmfr.transit.land/json-schema/dmfr.schema-v0.5.1.json",
11494
"feeds": [
11595
{
11696
"spec": "gtfs",
11797
"id": "f-9q9-bart",
11898
"urls": {
11999
"static_current": "http://www.bart.gov/dev/schedules/google_transit.zip"
120-
},
121-
"license": {
122-
"url": "http://www.bart.gov/schedules/developers/developer-license-agreement",
123-
"use_without_attribution": "yes",
124-
"create_derived_product": "unknown",
125-
"redistribute": "yes"
126-
},
127-
"tags": {
128-
"gtfs_data_exchange": "airbart"
129100
}
130101
},
131102
{
@@ -137,91 +108,222 @@ Group together multiple feeds using an operator:
137108
}
138109
}
139110
],
140-
"license_spdx_identifier": "CDLA-Permissive-1.0",
141111
"operators": [
142112
{
143113
"onestop_id": "o-9q9-bart",
114+
"supersedes_ids": ["o-9q9-bart~old"],
115+
"name": "Bay Area Rapid Transit",
116+
"short_name": "BART",
117+
"website": "https://www.bart.gov",
144118
"tags": {
145119
"us_ntd_id": "90003",
146120
"omd_provider_id": "bart",
147121
"wikidata_id": "Q610120",
148122
"twitter_general": "sfbart",
149123
"twitter_service_alerts": "SFBARTalert"
150124
},
151-
"name": "Bay Area Rapid Transit",
152-
"short_name": "BART",
153125
"associated_feeds": [
154126
{
155127
"feed_onestop_id": "f-bart~rt"
156128
},
157129
{
158-
"feed_onestop_id": "f-9q9-bart"
130+
"feed_onestop_id": "f-9q9-bart",
131+
"gtfs_agency_id": "BART"
159132
}
160133
]
161134
}
162-
]
135+
],
136+
"license_spdx_identifier": "CDLA-Permissive-1.0"
163137
}
164138
```
139+
140+
Alternatively, operators can be nested within feeds when there is a one-to-one relationship between a feed and an operator:
141+
142+
```jsonc
143+
{
144+
"$schema": "https://dmfr.transit.land/json-schema/dmfr.schema-v0.5.1.json",
145+
"feeds": [
146+
{
147+
"id": "f-west~virginia~university",
148+
"spec": "gtfs",
149+
"urls": {
150+
"static_current": "https://prt.wvu.edu/files/d/fcc4abeb-9e23-477b-b648-30623cb8ad80/gtfs-3.zip"
151+
152+
},
153+
"operators": [
154+
{
155+
"onestop_id": "o-dpp1s-wvuprt",
156+
"name": "West Virginia University Morgantown Personal Rapid Transit",
157+
"short_name": "WVU PRT",
158+
"website": "http://transportation.wvu.edu/"
159+
}
160+
]
161+
}
162+
],
163+
"license_spdx_identifier": "CDLA-Permissive-1.0"
164+
}
165+
```
166+
167+
The root-level `operators` array is typically used when:
168+
- An operator has three or more feeds
169+
- You want to organize feeds across multiple files
170+
171+
The nested `operators` array within a feed is typically used when:
172+
- There is a one-to-one relationship between a feed and an operator
173+
- You want to keep the feed and operator information together for simplicity
174+
175+
Note: The `name` field is required for operators, while `short_name` and `website` are optional.
176+
165177
## Fields
166178

167179
### IDs
168180

169-
Feed IDs can be any strings that are unique with a given DMFR file. These feed IDs can be [Onestop IDs](https://transit.land/documentation/onestop-id-scheme/), although that is not required by the DMFR spec. In the [Transitland Atlas](https://github.com/transitland/transitland-atlas) repository, DMFR files are required to use Onestop IDs.
181+
Feed IDs can be any strings that are unique with a given DMFR file. These feed IDs can be [Onestop IDs](https://transit.land/documentation/onestop-id-scheme/), although that is not required by the DMFR spec.
182+
183+
In the [Transitland Atlas](https://github.com/transitland/transitland-atlas) repository, DMFR files are required to use Onestop IDs.
184+
185+
### Feed specs and URLs
186+
187+
Each feed must specify a `spec` field that indicates the type of data contained in the feed. The following specs are supported, along with the following urls:
188+
189+
- `gtfs`: Static GTFS feed containing schedule data
190+
- `static_current`: URL for the current version of the feed
191+
- `static_historic`: Array of URLs for previous versions of the feed
192+
- `static_planned`: Array of URLs for future service changes
193+
- `static_hypothetical`: Array of URLs for potential future scenarios
194+
195+
- `gtfs-rt`: GTFS Realtime feed containing real-time updates
196+
- `realtime_vehicle_positions`: URL for real-time vehicle position updates
197+
- `realtime_trip_updates`: URL for real-time trip updates
198+
- `realtime_alerts`: URL for real-time service alerts
199+
200+
- `gbfs`: GBFS (General Bikeshare Feed Specification) feed
201+
- `gbfs_auto_discovery`: URL for GBFS auto-discovery file that links to all other GBFS files
202+
203+
- `mds`: MDS (Mobility Data Specification) feed
204+
- `mds_provider`: URL for MDS provider API endpoints
170205

171206
### Extended URLs
172207

173208
For static feeds contained in a zip archive, ideally the feed files are all in the root directory of the archive. However, this is not always the case.
174209

175-
Transitland Feed Registry supports an extended URL format that can reference files nested within a subdirectory. The extended URL format can also reference a zip file nested within another zip file.
210+
[transitland-lib](https://github.com/interline-io/transitland-lib) supports an extended URL format that can reference files nested within a subdirectory. The extended URL format can also reference a zip file nested within another zip file.
176211

212+
Example of nested zip file reference:
177213
```
178214
https://github.com/septadev/GTFS/releases/download/v201810010/gtfs_public.zip#google_bus.zip
179215
```
180216

181-
## Optional Stanzas
217+
## Optional Stanzas and Tags
182218

183219
### License
184220

185-
Based on [Transitland's approach to handling open data licenses](https://transit.land/an-open-project/) in all their variety.
221+
Based on [Transitland's approach to handling open data licenses](https://transitland/an-open-project/) in all their variety.
186222

187223
```jsonc
188224
"license": {
189225
"spdx_identifier": "", // see https://spdx.org/licenses/
190226
"url": "",
191227
"use_without_attribution": "yes", // enum: ["yes", "no", "unknown"]
192228
"create_derived_product": "yes", // enum: ["yes", "no", "unknown"]
193-
"redistribute": "yes", // enum: ["yes", "no", "unknown"]
194-
"attribution_text": "",
229+
"redistribution_allowed": "yes", // enum: ["yes", "no", "unknown"]
230+
"commercial_use_allowed": "yes", // enum: ["yes", "no", "unknown"]
231+
"share_alike_optional": "yes", // enum: ["yes", "no", "unknown"]
232+
"attribution_text": "", // if license requires that data consumers display specific text
233+
"attribution_instructions": "" // if license provides specific guidance to how data consumers should provide attribution
195234
}
196235
```
197236

198237
### Authentication
199238

200-
Requiring authentication for public data feeds is typically not a good idea. However, it's reasonable to require an API key for a GTFS-realtime endpoints and other feeds that involve active queries.
239+
Requiring authentication for public data feeds is typically not a good idea. However, it's reasonable to require an API key for a GTFS Realtime endpoints and other feeds that involve active queries.
201240

202241
```jsonc
203242
"authorization": {
204-
"type": "", // enum: ["header", "basic_auth", "query_param"]
205-
"param_name": "",
206-
"info_url": ""
243+
"type": "", // enum: ["header", "basic_auth", "query_param", "path_segment", "replace_url"]
244+
"param_name": "", // When type=query_param, this specifies the name of the query parameter
245+
"info_url": "" // Website to visit to sign up for an account
207246
}
208247
```
209248

249+
The following authentication types are supported:
250+
251+
- `header`: API key or token is sent in an HTTP header
252+
- `basic_auth`: Username and password are sent using HTTP Basic Authentication
253+
- `query_param`: API key or token is sent as a URL query parameter
254+
- `path_segment`: API key or token is included as a segment in the URL path. Indicate where the key/token should be injected using `{}`
255+
- `replace_url`: The entire URL should be replaced with a different URL that includes authentication
256+
257+
Auth credentials are not stored in a DMFR file. It's up to each software package that reads the DMFR format to implement its own way of reading auth credentials from a separate file or from environment variables and using them when fetching each feed.
258+
210259
### Tags
211260

212261
Tags allow extra information to be added to feeds and operators. Keys and values must both be strings.
213262

214263
```jsonc
264+
"feeds": [
265+
{
266+
"spec": "gtfs",
267+
"id": "f-example~feed",
268+
"urls": {
269+
"static_current": "http://example.com/gtfs_2025_02_01.zip"
270+
},
271+
"tags": {
272+
"unstable_url": "true" // note the quotes around true to specify a string value
273+
}
274+
}
275+
],
215276
"operators": [
216277
{
217278
"onestop_id": "o-9q9-bart",
218279
"tags": {
219-
"us_ntd_id": "90003",
220-
"omd_provider_id": "bart",
221-
"wikidata_id": "Q610120",
222-
"twitter_general": "sfbart",
223-
"twitter_service_alerts": "SFBARTalert"
280+
"us_ntd_id": "90003", // an identifier from the US National Transit Database
281+
"omd_provider_id": "bart", // an identifier from OpenMobilityData.org
282+
"wikidata_id": "Q610120", // an identifier from Wikidata
283+
"twitter_general": "sfbart", // a Twitter handle
284+
"twitter_service_alerts": "SFBARTalert" // a Twitter handle
224285
}
225286
}
226287
]
227-
```
288+
```
289+
290+
### Languages
291+
292+
The `languages` field is an optional array of IETF language tags that specify the languages used in the feed. This is particularly useful for feeds that contain multilingual content.
293+
294+
```jsonc
295+
{
296+
"languages": ["en-US", "es-MX"], // IETF language tags, see https://tools.ietf.org/html/bcp47
297+
// ...
298+
}
299+
```
300+
301+
### Superceding lineage
302+
303+
The `supersedes_ids` field is an optional way to indicate when a feed or operator record replaces previous ones. This is useful when:
304+
- An operator changes their name or organizational structure
305+
- Multiple feeds or operators are merged into a single record
306+
307+
Alternatively, when a static GTFS feed is substantially the same but published at a different URL, its old URL(s) may be retained under `urls.static_historic`.
308+
309+
Example for a feed:
310+
```jsonc
311+
{
312+
"id": "f-example~feed",
313+
"supersedes_ids": ["f-example~feed~old"],
314+
"spec": "gtfs",
315+
// ...
316+
}
317+
```
318+
319+
Example for an operator:
320+
```jsonc
321+
{
322+
"onestop_id": "o-9q9-bart",
323+
"supersedes_ids": ["o-9q9-bart~old"],
324+
"name": "Bay Area Rapid Transit",
325+
// ...
326+
}
327+
```
328+
329+

0 commit comments

Comments
 (0)