-
Notifications
You must be signed in to change notification settings - Fork 7
discuss & define data format #1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
My 2 cents about the format the we specify an API endpoint with: As the specific parameters of different backends (e.g. OpenTripPlanner, Navitia, HAFAS, EFA) are quite specific, let's keep the specified JSON fields specific to the backend type (i.e. different fields for OTP than for HAFAS) and define them only roughly. I'm of course fine with general properties of the API, such as a description of the data contained or the provider, to be specified in a consistent way. Personally, I like the format used by
In general, I'd like us to rapidly iterate on this format. If something doesn't fit, let's open a PR to change it and make a new version! |
Currently, at least in the German & European area, we have several open source projects that already specify API endpoints in a somewhat generalised way:
Similar resources:
(edited to include the projects mentioned in #1 (comment)) |
Regarding the KPublicTransport format:
|
For geocoordinates, I propose:
|
Additional projects with similar setups would be: |
Your definition of "usable" is what I'd consider to be "reliable". 😬 The phrasing aside, I'd say there are several nuances/levels of data coverage:
Of course, we could make this distinction arbitrarily precise, which wouldn't help all of these projects. |
Attribution information would probably be also a good idea for proper Open Data backends, even if those are still rare. Example: https://invent.kde.org/libraries/kpublictransport/-/blob/master/src/lib/networks/no_entur.json#L38 |
There are backends with more than one endpoints. For instance, most XML EFA backends provide both As different andpoints have different requirements and configuration variables, we shouldn't just have one JSON file per endpoint, but also one type definition. E.g. |
Do you think it makes sense to use the |
I'm not sure if such an "enum of types of APIs" will scale well. As an example, if you consider HAFAS endpoints, there are those with "crypto", without "crypto", |
For Hafas that's the two types we have implemented indeed, mgate.exe or the (old?) query.exe/ajax-getstop.exe/stbboard.exe variant, modeled as different types as they both need different requests and different result parsing. We currently have only one endpoint for the latter (ie. query.exe/ajax-getstop.exe/stbboard.exe combined, not each of them individually) - example: https://invent.kde.org/libraries/kpublictransport/-/blob/master/src/lib/networks/ch_sbb.json For EFA we have 1.5 variants: only a single request path, but two separate parsers depending on whether the result is the full XML or the mobile/compact variant. Our current config files model this as one type, with different parameters. This is also how we implement the small variations in the request parameters. We could also handle that as different types though, the impact on our implementation would be quite small. |
The format is based on the discussion in #1 and subject to further changes.
You're right. In fact, when it comes to the HAFAS query variant, some endpoints are mostly useless when viewed in isolation. For example, traininfo.exe is only usable with the trainLink obtained by using trainsearch.exe, so those should belong to the same JSON file. I think it's time to start tinkering with JSON files (at least for me, having an example endpoint definition in a JSON file works much better than just reading a discussion thread). To this end, I have created two DB HAFAS definitions (one for mgate, one for query) and an EFA (VRR) definition. They're suggestions based on the discussion so far; feel free to edit them as you see fit. For me, the following open questions remain:
|
For KPublicTransport this is connected to KDE's translation infrastructure, so they get translated automatically by just being there. No idea how we best handle that here.
Agreed. As long as there is a way to detect multiple endpoints for the same provider in client code I'd indeed let the client code decide on the priority. For single protocol clients this is simple anyway, multi-protocol clients should get good results by picking the better implemented or more powerful protocol first.
I'd go with "hafas_mgate" here, the "deutschebahn" special case in KPublicTransport is for coach layout support, which is a bit out of scope here I guess. |
data/de/db-hafas-mgate.json
Outdated
@@ -0,0 +1,59 @@ | |||
{ | |||
"name": "Deutsche Bahn (DB)", | |||
"type": "hafas_mgate_deutschebahn", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I said before, I don't think an "enum of types of APIs" will scale. I'd rather prefer something like "hafasMgate": true
, because it can be combined with other flags describing the endpoint.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, I see what you meant there now. That would work for us too. Making type
an array could be an alternative?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I prefer boolean flags (e.g. "hafasMgate": true
) over a "type":["hafas_mgate", ...]
array – checking whether a dict key exists is more straightforward than iterating over an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't care if its several boolean flags, or an array of flags. Both are a lot more future-proof than a single type enum.
"64": "Ferry", | ||
"8": "Local Train (RE/RB)" | ||
}, | ||
"locationIdentifierType": "db", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the idea behind this? @vkrause That the IDs returned by the endpoint are DB-style IBNRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
KPublictTransport has multiple id "namespaces" per location. This can be endpoint-specific ones that have no meaning outside (the default), proprietary ones that are shared between two or more endpoints (BVG/VBB are such an example), or standard ones (IBNR, UIC, IFOPT, etc). This is useful for merging data from different sources (different backends, OSM, Wikidata, etc).
To support this we have the following settings:
locationIdentifierType
defines the id namespace. This is optional for proprietary id spaces not used anywhere else.- For many Hafas-based endpoints there is the problem that they use an IBNR or UIC code for stations having one of those, but a proprietary numeric scheme for everything else. The
standardLocationIdentifierType
andstandardLocationIdentifierCountries
options address this, the list of covered UIC country codes is needed to reliably distinguish IBNR/UIC codes from other numeric values.
This is obviously very specific to what KPublicTransport does, not particularly elegant or generic, and for most existing users probably irrelevant. I could imagine something like this to be relevant for your merging work though?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is obviously very specific to what KPublicTransport does, not particularly elegant or generic, and for most existing users probably irrelevant. I could imagine something like this to be relevant for your merging work though?
Yes, highly relevant for my merging work! In fact, there are several projects (in the European community) that try to cross-reference public transport "things" in some way.
This allows for fine-grained endpoint descriptions and should be more flexible than the enum approach.
It's a catch-all for trainsearch.exe, query.exe, traininfo.exe, stboard.exe and more, so the endpoint should contain the base path only.
The |
Having those entries that are unspecified by this spec in a nested object probably makes maintaining this spec easier, having them directly at the root level increases the usability. I don't really care about this though, I'd rather try in practice what we have. |
LGTM for now! |
+1, let's make this v1 and see how it turns out in practice. I moved the documentation to the main readme file and specified the language codes (I presume we're going to use ISO 639-1), so we should be good to go. |
@vkrause Please merge if you think this looks good. |
Agreed, let's get this in, and continue in smaller/more focused PRs/issues to keep the discussion easier to follow. |
The format is based on the discussion in #1 and subject to further changes.
No description provided.