Skip to content

Feature idea: datalad get from other local source without copying files again #7674

@lnnrtwttkhn

Description

@lnnrtwttkhn

Description

Hi everyone, following YODA principles, I regularly run into the following "issue": To keep datasets modular, I usually add them as subdatasets in an inputs directory while they also exist at the project-directory level. Here is an example: I have a BIDS DataLad dataset (bids) that I add as a subdataset to my fmriprep DataLad dataset:

myproject
.
├── fmriprep
│   ├── code
│   └── inputs
│       └── bids (5992f12) # same DataLad dataset as below
└── bids (5992f12)

Now when I run fMRIprep, I give it ./fmriprep/inputs/bids as the input path. But this involves running datalad get to actually get the files of the BIDS dataset into that place. To speed this up, I usually configure a local DataLad sibling for ./fmriprep/inputs/bids like this datalad siblings add -s local --url ../../../bids. Then datalad get can retrieve the data from local. But then I have the full size of the BIDS dataset in two locations which takes up additional disk space. Of course, I could datalad drop the files again but, and here comes the idea, maybe there is a way to adjust the path such that the data does not have to be retrieved and copied again, while still staying in line with YODA principles.

I am not even sure if this is something that can or / should be handled on the DataLad side but maybe you know other nice workarounds for this? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions