Date: | 2025-03-10 |
---|---|
Authors: | Markus Demleitner
Brian Major |
Lang: | en |
(This might become an IVOA note one day; for now, we keep the content in the README as ReStructuredText. Feel free to change directly or to do a Pull Request; don't forget to add yourself to the authors).
- Users should not see different VOs depending on which client they use
- Services should not be required to implement multiple major versions
- There cannot be flag days when all services must switch from version A to version B
- Clients should not be required to impement multiple major version
- A client written in year Y should not break because of our transition before year Y+N, where N is perhaps 5 or so
These desiderata are clearly conflicting. If we want to strictly meet all of them, we cannot have major version transitions. It is the purpose of this document to figure out where we compromise.
An important corollary to the realisation that we deal with severely conflicting requirements is: major version transitions are always painful. Let's avoid them whenever we possibly can.
DocStd is not very precise on what makes a major version:
The [major version] number increments to 1 for the first public version, and to 2, 3, ..., for subsequent versions that are not backward compatible and/or require substantial revisions to implementations.
As a pragmatic, at least potentially testable definition, let me propose: “A breaking change is one that, at least without careful management, violates any of the desiderata above”.
In practice, there are several ways in which a revision of a standard can impact interoperability (i.e., be “breaking”); in the following examples, the old version is called A, the new version is called B:
- Version B requires a feature in a service that was not required in version A. Clients written against version B that use that feature will fail when querying a version A service. Example: we could have required support for TIME and BAND in SIAP1; legacy services would have ignored the constraints, so the results are mildly wrong. For standards rejecting unknown parameters, the services would be entirely broken.
- Version B stops requiring a feature (and has possibly replaced it by something else). A version A client needing that feature will fail when querying a version B service. Example: if we stopped requiring TAP_SCHEMA in TAP (which, incidentally, MD would very much like to do), clients expecting to get their column metadata from there will assume new services are broken and not let users inspect table metadata.
- Version B has a new standardID. Clients of version A will not discover services for version B any more until an update, and depending on the client (or the version of the client) users will see different sets of services and hence a different VO. Example: SIAP1 clients will not see SIAP2 services; this is one of many problems when one wants to globally search images in today's VO (cf. global data set discovery).
- Version B modifies the behaviour of feature X. Legacy clients break, sometimes predicatably, sometimes (apparently) randomly. Example: Predictable failure on every document using binary serialisation (but: this may have looked random given that many services return TABLEDATA-serialised VOTables) would have happened if had we simply fixed VOTable's BINARY serialisation for sane NULL value handling in VOTable 1.3. To be clear: This is not what happened; in actual history, instead we defined BINARY2 and only “mildly deprecated” BINARY[1].
Sometimes we can introduce breaking changes with impunity because we know they do not actually break anything. In this section, we look at some examples for how we allow standard changes that might be breaking to go forward nevertheless.
A classic example is when SimpleDALRegExt has defined a type ProtoSpectralAccess and we can see in the Registry that nobody uses this type any more. Hence, no publishing registry will become invalid when we pull the type, and a change that could technically be breaking will almost certainly not have bad consequences.
Sometimes we may have “soft breakage”. Consider an image service that would like to offer cutouts. It could, instead of returning FITS files, return datalink documents with a SODA service rather than FITSes. That service will appear completely broken to legacy clients, which probably will show something like “broken image” for all results. That is hard breakage[2].
If, on the other hand, that image service defined an associated datalink
service, declares that using a datalink service block with standardID
ivo://ivoa.net/std/DataLink#links-1.1
and otherwise keeps serving
FITSes, a datalink-enabled client can offer additional retrieval options
that a legacy client cannot, which arguably makes the VO “look
different”. But at least the legacy clients still do what they were
expected to do when they were written (retrieve full images). On the
other hand, because the service has to both hand out FITSes and
Datalinks, one might argue this is already in mild violation of our
“services should not have to implement two versions at the same tiime“
desideratum.
In the early days of the Virtual Observatory, XML namespace URIs were
written with minor versions in them (for instance,
http://www.ivoa.net/xml/VOTable/v1.1
for the original VOTable).
This made even minor version steps break all XML-conformant clients.
This mistake was repaired in an endorsed Note, XML Schema Versioning
Policies. The note in particular explains which kinds of changes are
legal while keeping the namespace URIs constant (i.e., „what is a
non-breaking change?“), which has certain impacts on how clients need to
be written. The most important requirement here is that VO clients
parsing XML must ignore unknown elements to ensure that future
developments can add features.
The XML versioning policies try hard not to break anything itself. In
particular, the document froze the namespace
URIs whereever they were whan it was
adopted. The consequence is that the XML schema version and the version
apparently implied from the namespace URI now disagree. For instance,
http://www.ivoa.net/xml/VOTable/v1.3
is the namespace URI for
VOTable versions 1.3, 1.4, and 1.5 (and all further VOTable 1 versions).
While this keeps confusing implementors, it is at the same time an
example for the sort of pain one has to accept when maintaining
interoperability with systems that were designed in a suboptimal way –
and of how little errors made when authoring standards can explode
into huge problems when evolving technologies.
In the early days of the Virtual Observatory, XML namespace URIs were
written with minor versions in them (for instance,
http://www.ivoa.net/xml/VOTable/v1.1
for the original VOTable).
This made even minor version steps break all XML-conformant clients.
This mistake was repaired in an endorsed Note, XML Schema Versioning
Policies. The note in particular explains which kinds of changes are
legal while keeping the namespace URIs constant (i.e., „what is a
non-breaking change?“), which has certain impacts on how clients need to
be written. The most important requirement here is that VO clients
parsing XML must ignore unknown elements to ensure that future
developments can add features.
The XML versioning policies try hard not to break anything itself. In
particular, it froze the namespace URIs whereever they were whan it was
adopted. The consequence is that the XML schema version and the version
apparently implied from the namespace URI now disagree. For instance,
http://www.ivoa.net/xml/VOTable/v1.3
is the namespace URI for
VOTable versions 1.3, 1.4, and 1.5 (and all further VOTable 1 versions).
While this keeps confusing implementors, this is at the same time an
example for the sort of pain one has to accept when maintaining
interoperability with systems that were designed in a suboptimal way –
and of how little errors made when authoring standards can explode
into huge problems when evolving technologies.
There are several 104 Simple Cone Search (SCS) interfaces in the Virtual Observatory. SCS has been one of the first VO standards, and it has several warts from today's perspective, such as:
- Error messaging is non-standard with respect to DALI
- It uses several ancient and (in the modern VO) invalid UCDs (e.g., MAIN_ID) that are critical for the sensible interpretation of the results
- There is no way to discover additionally supported parameters from the service (in principle, services can attach VOSI capability endpoints to the services and declare extra parameters in interface elements in there, but since the standard does not mention that, few services actually do that, and I don't think any clients make attempts to use anything like it).
Issue (3) could be fixed in a backwards-compatible way by requiring VOSI capabilities in SCS 1.2 (or so); old clients would not know about the extra parameters, new clients could show, for instance, extra query fields. The differences in appearance would probably be acceptable, given that interfaces vary depending on the client anyway.
Issue (1) would only matter in case of failures, where legacy clients would show generic, non-informative failures (“no data found” or perhaps something resembling an HTTP-level error if we also fix the “always return 200” policy of current SCS), and only new clients would display useful error messages generated by the VO interface. I would suggest that might still count as a soft failure. Also (though I'd say that's acceptable), legacy validators would flag new services as non-compliant.
Fixing issue (2) will probably break many clients that will not be able to make sense of the results, i.e., deserialise them to lists of objects with an ids and positions. On the other hand, since we would use current UCDs, many clients would still be able to do the right thing.
If we issue a SCS2, what would happen?
Without management, we will have SCS1 and SCS2 in parallel for an unforseeable future; this is what we have with SIAP at the moment.
The consequence: some data collections will have SCS1 interfaces, others SCS2, presumably many others both. Legacy clients will not see SCS2 (so the VO will look differently for them). Modern clients will probably see and use both. This is not necessarily a large problem as long as services produce just one resource record with one capability each for each standard; it would be up to the client to hide the fact that there are two interfaces on the same ressource. Still, clients still doing searches by service type plus keywords will have to modify their registry interfaces in such a world.
It has been suggested to modify VO protocols so that they return, say, some form of JSON rather than VOTable. There are various ways in which such a change could be effected, which we briefly discuss below.
Each of the following cases assumes the pre-existence of a VO-JSON standard that defines how to uniquely encode the content model of VOTable (FIELDs, PARAMs, GROUPs, INFOs, etc) in JSON. This VO-JSON would then need to get some semi-blessed media type, presumably in DALI.
(a) issue a new minor version adding (or modifying) a DALI RESPONSEFORMAT parameter to the protocol. To keep the interface stable, VOTable output must remain the default, but interested clients could request VO-JSON explicitly after inspecting a service's minor version. If VO-JSON has major advantages, it might eventually crowd out VOTable, and we could perhaps one day switch the default and only have soft breakage. Advantage: Nothing breaks. Disadvantage: it's a pain on both clients and servers, both of which have to support both formats for an indefinite time. Also, the experience with VOTable BINARY2 (which has clear advantages over BINARY) suggests that that wouldn't work.
(b) issue a new major version that switches the default to VO-JSON and require that for a definite period, services implementing the new version also provide an interfaces to the old. Advantage: there is at least a theoretical date at which the old standard vanishes. For legacy clients, nothing breaks until then. New clients would probably still support the old standard to avoid losing services, so the only advantage over scenario (a) is that there is a theoretical date at which clients only supporting the old version will stop working.
(c) issue a new major version and deprecate the old version. If data providers move along, legacy clients will see less and less of the VO and new clients more and more (assuming they would not ignore the deprecation, which the probably will not, at least initially). It is not unlikely that users of legacy clients (and, if still around, their authors) would feel an increasing pressure to upgrade. Advantage: It's simple for (courageous, or those with very attractive data) service operators. Disadvantage: we will have a split VO at least until the last legacy clients are phased out. Also, the SIAP experience would suggest that it will simply not work. As of 2025, are still new SIAP1 services coming online, and SIAP2-only clients will miss out out quite a few datasets out there.
We have got some things wrong in the past that make transitions harder now. This section collects some of them.
In the registry, it was originally envisioned that standards would be identified through the same string regardless of the version, and different versions would be handled on the level of interfaces. Based on how actual clients were doing their discovery, it was later decided that that was not a good idea and different major versions should also have different standard ids.
However, by that time the standard ids of most of the protocols we are using today were already defined. And we told clients (for other reasons) to do prefix matches on standard ids. For instance, to look for TAP services, you would use a constraint like:
WHERE standard_id LIKE 'ivo://ivoa.net/std/tap%'
This is bad, because later on, when there are new major versions of TAP,
that will also match tap2
, tap3
and so on, and hence legacy
clients will discover services they cannot talk to.
In the future, we should version-tag the identifiers from the start. That is not immediately perfect, either, because the native pattern above would then be:
WHERE standard_id LIKE 'ivo://ivoa.net/std/tap1%'
and hence include tap10
, tap11
, etc, too. Future standards,
sowever, will define features, and once they do that, the discovery
pattern will be a version-safe:
WHERE standard_id LIKE 'ivo://ivoa.net/std/tap1#query-1.%'
or similar. How we keep prefix-matching legacy clients from discoving newer services without making their standard ids ugly, however, is still unclear.
It is certainly a nasty problem. We will need to compromise, and I suspect it will take an attempt or two to figure out which compromises are acceptable.
MD, after pondering a bit, thinks that we ought to use SCS as a test case; while that's not exactly what the P3T has focused on, it seems to be a manageable problem with relatively clear, easily explained bounds (which, for instance, MD couldn't do for outlawing urlencoded parameters).
The main compromise MD would foresee would probably be on the side of requiring service operators to maintain two major versions for a while (under the premise that clients are harder than services). Among the things MD would like to fix is that SCS services would normally receive a table name, so VizieR could run just a single SCS2 service; and there's quite a bit more, like expressing additional constraints, but that is DAL turf, not ours.
Once there is the new major version, it would be us who have figure out:
- how new and old services can coexist
- whether to require SCS1 on new SCS services for the transitionary period
- how we can keep registry queries for ”give me all cone searches, preferring SCS2 if it's there” halfway comprensible
- how to monitor the transition, when to start petitioning services that they really need to migrate now
- when and how to decide to drop SCS1 (fixed term? based on a quorum?)
- and probably much more.
Let's get to work.
[1] | By the way, that hasn't worked too well either. The golden rule of interoperability (“be strict in what you produce, be lenient in what you accept”) in that situation would suggest that as a server, you still return BINARY, which is what, for instance, DaCHS does to this day, 12 years after the publication of VOTable 1.3. |
[2] | MD would like to would argue, though, that even that scenario can be turned into a “softer” breakage by at least making the datalink document usable in a browser, perhaps using xslt (cf. https://github.com/msdemlei/datalink-xslt); if the client displays the URI of the failing image, it is not unlikely that users would try their browser on it and then be at least able to manually retrieve the data set. |