Description
Background
The Brain Imaging Data Structure (BIDS) specification currently distinguishes between data types (represented as subdirectories under each subject) and modalities (represented as file suffixes). However, there appears to be inconsistency in how these distinctions are made across different kinds of data.
Some examples of the current state:
anat
is a data type for MRI anatomical recordings, witht1w
,t2w
, etc, as modalitieseeg
,ieeg
, andmeg
are separate data types and modalities for different neural recording methodsmotion
exists as its own data type and modalityphysio
exists as a modality but not yet as a data type (being addressed in BEP045, BEP about non-neuronal physiological (cardiac, respiratory, skin conductance, gastro, ...) data and physiological data derivatives #1675)- Eye tracking is being added as a recording type under the
physio
modality (BEP020, [ENH] BEP 020 Eye Tracking #1128), with standalone eye tracking placed under thebeh
data type emg
data type and modality is being proposed (BEP042, [ENH] extension for electromyography (EMG) - BEP042 #1998)
Current Discussion
There is an ongoing discussion under #1998 about whether certain data should:
- Have their own dedicated data type and modality
- Be incorporated under an existing umbrella data type
- Be embedded within other modalities when appropriate
The discussion initiated as to whether EMG should:
- Have its own data type and modality (similar to EEG/MEG/iEEG)
- Be incorporated under the
physio
modality (similar to eye tracking in BEP020)
However, I believe that the scope of the issue is larger than EMG, and appreciate the community to provide their inputs.
Key Considerations
When determining whether a data category deserves its own data type/modality or should be incorporated under an existing umbrella, the following factors have been raised:
-
Signal source and nature: Is the signal brain-derived vs. peripheral? Neural vs. non-neural?
-
Data dimensionality and complexity: Does the data have unique requirements in terms of channel count, sampling rate, or format that make existing structures insufficient?
-
Research usage patterns: Is the data commonly used as a standalone dataset, or primarily as an auxiliary measurement to other data types?
-
Technical requirements: Does the data require specific metadata fields, coordinate systems, or other specifications that don't align with existing structures?
-
Community needs: Is there sufficient research activity and community interest to warrant a dedicated structure?
-
Fragmentation concerns: Does creating a new data type/modality risk fragmenting the BIDS ecosystem unnecessarily?
-
Consistency with existing structures: How would the decision align with precedents set by other data types?
The case of EMG-BIDS
For EMG data (BEP042), arguments for a dedicated data type include:
- EMG data is often high-dimensional (>200 channels with 2+ kHz sampling)
- EMG can target multiple muscles and requires specific placement information
- EMG can directly derive/estimate neural discharges
- There is significant standalone EMG research
- Multiple EMG devices can record simultaneously
- EMG closely follows other electrophysiology data types (EEG/iEEG/MEG) and current research closely relates the signals to neural activity.
- Motion-BIDS is a standalone data type and modality suggesting that not all BIDS datatypes and modalities should be "brain-related."
Arguments for incorporating EMG under physio
:
- Creating new modality suffixes can fragment BIDS
- Similar physiological signals (e.g., eye-tracking) are managed under
physio
- The BEP020 approach with
PhysioType
field could accommodate EMG-specific metadata - Consistency with how other non-brain physiological recordings are handled (EKG, Eyetracking under physio)
Questions for the Community
-
What should be the threshold criteria for creating a new data type vs. using an existing one?
-
Should brain-derived signals be treated differently from other physiological signals? If yes, how this differentiation applies to the current specifications, including Motion-BIDS and ongoing PRs.
-
How should we balance the need for specificity against the risk of fragmentation?
-
Should we establish a formal policy for what constitutes grounds for a new data type/modality?
-
How can we ensure that similar types of data (e.g., various physiological recordings) are treated consistently across the specification?
-
What is the threshold or recommendation for using the data specific modality/recording versus embedding data under other modalities for example, Eye-tracking and EMG can be embedded under EEG as channels.
Next Steps
This discussion has implications beyond just EMG data and could affect how future data types are incorporated into BIDS. We could also consider whether:
- A formal policy document should be developed
- Existing data types should be reviewed for consistency
- A dedicated BEP should address this foundational question
Community input from researchers working with diverse data types, and stakeholders @bids-standard/steering, @bids-standard/maintainers, @bids-standard/raw-eyetracking, @bids-standard/bep042, @smoia, @m-miedema, @arnodelorme is essential to ensure BIDS remains both comprehensive and coherent.