Skip to content

Use datalad-osf for already existing OSF repositories #97

@sappelhoff

Description

@sappelhoff

Question

How can I turn an existing OSF repository into a datalad dataset and mirror it to GitHub?

Previous solution

I have an OSF repository with data: https://osf.io/cj2dr/

Several months after uploading this dataset on OSF, I wanted to easily download single files from it, and Datalad seemed a good way to do this.

So I did the following (felt very hacky, and still seems hacky to me):

  1. Locally created a new datalad dataset: datalad create eeg_matchingpennies
  2. Using a patched version of templateflow/datalad-osf I recursively put every file of my OSF repository into a CSV with the filenames and their download URLs
  3. Then using datalad addurls I updated my new datalad dataset with the existing OSF files
  4. Finally, I pushed this dataset to GitHub using these steps:
    1. from my new datalad dataset: datalad install -s eeg_matchingpennies clone
    2. cd clone
    3. git annex dead origin
    4. git remote rm origin
    5. git remote add origin <put URL of new, empty GitHub repo here>
    6. datalad publish --to origin

This worked nicely, and the result is here: https://github.com/sappelhoff/eeg_matchingpennies

Problems

  • Recently I had to update the source dataset, so I edited the files on OSF ... which then of course screwed up my datalad version (see my post on NeuroStars)
  • I don't know how I could have done this in a "proper" datalad style way ... because when I get my OSF data via datalad (which goes the GitHub route), and then change files locally and datalad save and datalad publish, there is no connection to the actual source of the data on OSF ...

I imagine that this problem can now be solved using this new datalad extension. Is that correct? If yes, perhaps we can make a user case out of this for the documentation.

I could imagine that there are several people with already existing OSF datasets that would want to datalad-ify them.

In a few steps, what I imagine:

  1. I have an OSF data repo
  2. Do some steps to turn into datalad dataset
  3. mirror this on some GitHub like site
  4. When I need to change something, use datalad install from the GitHub like site
  5. Edit locally
  6. Then publish, which should automatically update (i) the GitHub like site, and (ii) the OSF source data

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions