-
Notifications
You must be signed in to change notification settings - Fork 373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] Add CKAN content provider #1336
Conversation
Thanks for submitting your first pull request! You are awesome! 🤗 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apologies for the delay in reviewing! Thank you for the PR :) I've left a couple of style comments that should be fairly easy fixes.
The one question for me is wether the unit should be 'dataset' or 'dataset at a particular version'. This is illustrated by the current unit test failing because the version of the package has changed. The equivalent of sorts in git land is 'git repo url + hash', where the hash is counted as the 'ref'. If no 'ref' is provided, we default to using the latest (or whatever HEAD
) points to. Does CKAN need something similar?
Thank you @yuvipanda for your review! As a newcomer to Project Jupyter, it really encouraged me. Regard to the question referring to the unit, there are indeed "activities" that work like the versions for datasets. For example: https://demo.ckan.org/dataset/activity/sample-dataset-1 However, activities are often unavailable as they can be hidden by admins. In fact, the above demo site of CKAN also restricts the access to activities. Another example is the data.gov. Therefore, from my point of view, it's much safer to take the 'dataset' rather than the 'dataset at a particular version' as the unit for the CKAN provider. |
Does this just disable listing previous versions, or does it completely prevent access to previous versions? Do you know why admins chose to prevent this, since they're reducing the utility of their CKAN repository as a reproducible data source? |
When the
AFAIK, most of sites were upgraded from the old versions of CKAN. Or I think we could still support the versions (urls with activity_id), e.g.:
And go to the latest one when the version is missing, e.g.:
|
That sounds reasonable to me- support a link to a fixed version where possible, fallback to the latest if not. Are there any disadvantages to doing this? |
I don't see any disadvantages in this moment. I will try to handle this in the following days. |
The content provider has been updated to support the versions ("activities" in CKAN) and fallback to the latest. There are two types of URL patterns to the activities:
Also note that the |
Hi @yuvipanda @manics, if there is any need for further improvement, please let me know. Thanks! |
@u10313335 apologies for the delayed response. I think this is good to go, but I wanted to cleanup some of the URL handling a little bit. I can't seem to push to your branch, nor make a PR to your branch for some reason, so I've pushed my changes to https://github.com/yuvipanda/repo2docker/tree/ckan. Also here's a patch file, which can be applied via
If these changes look good to you, apply those and I'll merge it? If not you can also select the 'allow edits from maintainers' button and I'll push these in too. |
Thanks @yuvipanda again for the reviews and suggestions! I have applied your commits. |
I've somehow missed your comment - sorry about that, @u10313335. Tests are now unfortunately failing due to #1354. I'm running all these tests locally - if they pass, I'll merge this PR (as it's already been waiting so long!) before figuring out how to fix that. Thank you for your patience. |
I was able to successfully run these tests locally, so I'm going to merge this one. I'll try to come back to the associated binderhub PR soon as well. Thank you so much for your patience as we worked through this, @u10313335! Hope to continue to see PRs from you, and I think our reviewing speeds should be better now. |
This PR add support for CKAN datasets as content provider for repo2docker, e.g.:
It has been tested on some CKAN repositories such as:
We also adopt this PR in our Binder service (as the
CKAN dataset
repository): https://binder.depositar.io/, e.g.: