Skip to content

Conversation

@kabilar
Copy link
Member

@kabilar kabilar commented Aug 7, 2025

As a follow up to bids-standard/bids-specification#1741, it looks like rawdata/ at the top-level is no longer valid.

[BIDS.NOT_INCLUDED] ~/linc/000005/rawdata — Files with such naming scheme are not part of BIDS specification. This error is most commonly caused by typos in file names that make them not BIDS compatible. Please consult the specification and make sure your files are named correctly. If this is not a file naming issue (for example when including files not yet covered by the BIDS specification) you should include a ".bidsignore" file in your dataset (see https://github.com/bids-standard/bids-validator#bidsignore for details). Please note that derived (processed) data should be placed in /derivatives folder and source data (such as DICOMS or behavioural logs in proprietary formats) should be placed in the /sourcedata folder.

cc @dstansby @satra @balbasty @ayendiki @yarikoptic

@effigies
Copy link
Contributor

effigies commented Aug 7, 2025

Just a note that it was never valid inside a BIDS dataset. The example shows how to put a BIDS dataset next to derivatives instead of nesting one inside the other.

@balbasty
Copy link

balbasty commented Aug 7, 2025

I think it changes drastically how we use dandi, then.

Dandi requires/assumes (?) that each dandiset is a valid bids dataset, and we used to nest derivatives/ and sourcedata/ inside this "super dataset", with the implicit understanding that what's under these two directories does not have to be valid BIDS. Can we now see a dandiset as a "collections of bids or non-bids datasets", and keep derivatives/, sourcedata/ and rawdata/ at the top level, with only rawdata/ being a valid BIDS dataset?

I.e., go from

{dandiset_id}/
|- dataset_description.json
|- sourcedata/
|- rawdata/
|- derivatives/

to

{dandiset_id}/
|- sourcedata/
|- rawdata/
   |- dataset_description.json
|- derivatives/

@satra
Copy link

satra commented Aug 7, 2025

pinging @yarikoptic - as i think there is a related proposal somewhere as well that's a slight variant of what @balbasty showed above. this came up in the NWB Dev hackathon for multi-subject (dyadic, n-adic) experiments.

yarikoptic added a commit to yarikoptic/BIDS that referenced this pull request Aug 7, 2025
- Add note about BIDS Raw datasets being distributable without derivatives
- Include dataset_description.json in directory structure examples
  to emphasize where we observe legit BIDS datasets
- Explain disadvantages of nested dataset organization for distribution
- Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets
- Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories
- Re-Include example of non-nested dataset organization in my_study folder
  (I based this change on top of the removal proposal in
  bids-standard#687)
@yarikoptic
Copy link
Contributor

yarikoptic commented Aug 7, 2025

I think it changes drastically how we use dandi, then.

Let's hope it is not that of a drastic change ;-)

Dandi requires/assumes (?) that each dandiset is a valid bids dataset

yes - BIDS is one of the allowed "dataset layout" "standards".

, and we used to nest derivatives/ and sourcedata/ inside this "super dataset", with the implicit understanding that what's under these two directories does not have to be valid BIDS.

Correct understanding overall. See more info at https://bids-specification.readthedocs.io/en/stable/common-principles.html#other-top-level-directories . Overall -- subfolders there do not have to be BIDS datasets. But if you announce any of them to be a BIDS dataset (by having corresponding dataset_description.json in that folder) -- they should be valid.

Can we now see a dandiset as a "collections of bids or non-bids datasets", and keep derivatives/, sourcedata/ and rawdata/ at the top level, with only rawdata/ being a valid BIDS dataset?

it is the rawdata/ which is the only culprit here. As @effigies has mentioned, rawdata/ was never declared as part of BIDS and only was proposed on how to organize outside of a BIDS dataset. See e.g. 1.9.0 BIDS which did still mention rawdata/. In 1.10.0 BIDS I/we moved it under sourcedata/raw which made it "kosher" as (per above) sourcedata/ does NOT have to contain only BIDS datasets.

And what @satra has mentioned is the

where I propose a DatasetType = "study" to be included in the "super dataset" which would by itself not even contain any sub- folder and rather compose all the sourcedata/ and derivatives/. If you think it is a good idea -- chime in on that PR. if you do not think so -- .... well, chime in too ;-)

With all the above in mind I would recommend to move rawdata/ under sourcedata/raw. If you can't -- you could potentially add rawdata into .bidsignore file but ATM we would not "attend to it" (only stock bids-validator which we use at dandi-cli level, not at dandi archive backend yet).

Going back to this PR: that example was not well formulated to start with. Is that directory (my_study/) a BIDS dataset or not? if a BIDS dataset, we better adhere to convention used in bids-specification itself by adding dataset_description.json at the root of that folder to signal that. And example anyways had it as raw_data not rawdata.

To better the situation

@balbasty
Copy link

balbasty commented Aug 7, 2025

I think it changes drastically how we use dandi, then.

Let's hope it is not that of a drastic change ;-)

Haha I was a bit dramatic! I read the issue too quickly and thought it was on a dandi/linc repo, I only now see we're at bids. Sorry for poluting the issue :)

@kabilar
Copy link
Member Author

kabilar commented Aug 7, 2025

Thanks all for the discussion and clarifications. This makes sense and we will move raw data under sourcedata/raw for the LINC project.

Superseded by #688.

@kabilar kabilar closed this Aug 7, 2025
effigies pushed a commit to yarikoptic/BIDS that referenced this pull request Oct 2, 2025
- Add note about BIDS Raw datasets being distributable without derivatives
- Include dataset_description.json in directory structure examples
  to emphasize where we observe legit BIDS datasets
- Explain disadvantages of nested dataset organization for distribution
- Clarify that sourcedata can contain Raw, non-BIDS, or derivative datasets
- Add requirement for BIDSVersion key to identify BIDS datasets in subdirectories
- Re-Include example of non-nested dataset organization in my_study folder
  (I based this change on top of the removal proposal in
  bids-standard#687)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants