-
Notifications
You must be signed in to change notification settings - Fork 12
Description
Question
How can I turn an existing OSF repository into a datalad dataset and mirror it to GitHub?
Previous solution
I have an OSF repository with data: https://osf.io/cj2dr/
Several months after uploading this dataset on OSF, I wanted to easily download single files from it, and Datalad seemed a good way to do this.
So I did the following (felt very hacky, and still seems hacky to me):
- Locally created a new datalad dataset:
datalad create eeg_matchingpennies - Using a patched version of templateflow/datalad-osf I recursively put every file of my OSF repository into a CSV with the filenames and their download URLs
- Then using
datalad addurlsI updated my new datalad dataset with the existing OSF files - Finally, I pushed this dataset to GitHub using these steps:
- from my new datalad dataset:
datalad install -s eeg_matchingpennies clone cd clonegit annex dead origingit remote rm origingit remote add origin <put URL of new, empty GitHub repo here>datalad publish --to origin
- from my new datalad dataset:
This worked nicely, and the result is here: https://github.com/sappelhoff/eeg_matchingpennies
Problems
- Recently I had to update the source dataset, so I edited the files on OSF ... which then of course screwed up my datalad version (see my post on NeuroStars)
- I don't know how I could have done this in a "proper" datalad style way ... because when I get my OSF data via datalad (which goes the GitHub route), and then change files locally and
datalad saveanddatalad publish, there is no connection to the actual source of the data on OSF ...
I imagine that this problem can now be solved using this new datalad extension. Is that correct? If yes, perhaps we can make a user case out of this for the documentation.
I could imagine that there are several people with already existing OSF datasets that would want to datalad-ify them.
In a few steps, what I imagine:
- I have an OSF data repo
- Do some steps to turn into datalad dataset
- mirror this on some GitHub like site
- When I need to change something, use
datalad installfrom the GitHub like site - Edit locally
- Then publish, which should automatically update (i) the GitHub like site, and (ii) the OSF source data