Description
This is an FYI to anyone who sees it that an upcoming release (possibly v0.35
) will likely contain breaking changes to the retrieval API.
referencing
of course is in pre-1.0 / stable status (for slightly more details there see #38). I still of course don't wish to unnecessarily cause churn for people who've begun to adopt it, but it's clear there's at least one, possibly two, things in need of fixing.
Specifically, though I knew it would happen at some point, referencing needs to use a URL
implementation in order to properly perform URL normalization -- in other words, as a simple example, knowing that HTTP://example.com
is the same as http://example.com
. Of course the full gamut is more complicated.
This means it needs to internally use a library (which will be url.py) which properly implements RFC3986 (though technically that one implements WHATWG URLs). Some functions then will need to take URL
s rather than str
s. Yes it is possible to sprinkle isinstance
checks everywhere and convert always between str
and URL
but this will leave a small performance penalty everywhere, and also just generally seems unwise to leave in place before referencing
continues to mature.
In particular:
Expect retrieve
callbacks to likely receive URL
objects rather than str
s in an upcoming release.
I still expect this to affect only a small number of users ultimately, because the relevant APIs do not relate to Resource
+ Registry
assembly, they relate to "internal" (but public) APIs like Resolver.lookup
and Registry
retrieval but I'm putting this here anyways.
Further details to come shortly, I'm still working on a set of changes to support this. If you wish to follow along, #74 contains the tests I have added (upstream to the test suite) which "tease out" the underlying issue the above references.
(And as a second example, which I again am unsure how easy I can fix but which I suspect should happen before #38 -- see #65. Ideally I'll find a way to do both of these changes at the same time.)
As a third example (and a slight reminder to myself), we also may not have precisely the right behavior vis a vis re-setting the base URL after retrieval, which should likely take precedence over the initial base URL for a Retriever
(i.e. the object you get back should have a new base URI using the retrieval URL)