Skip to content

[ENH] extension for electromyography (EMG) - BEP042 #1998

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: master
Choose a base branch
from

Conversation

drammock
Copy link

@drammock drammock commented Dec 6, 2024

This is a very early WIP implementation to add EMG support. CIs are not expected to pass yet.

cc @neuromechanist @jwelzel @larsoner @arnodelorme @robertoostenveld feel free to push directly to this branch, I'll add you as repo collaborators on my fork

Note

We meet regularly to discuss this BEP

Next meeting: 18 Dec 2024 on https://ucsd.zoom.us/j/96433382377

Communication channel on github repo / matrix / slack / discord : #1371

@drammock
Copy link
Author

cc @agramfort

@Remi-Gau Remi-Gau added the BEP label Dec 19, 2024
@yarikoptic yarikoptic changed the title [ENH] extension for electromyography (EMG) - BEP42 [ENH] extension for electromyography (EMG) - BEP042 Jan 16, 2025
@sjeung
Copy link
Collaborator

sjeung commented Feb 26, 2025

Hi, @neuromechanist pointed me to this PR and I would like to share some thoughts. This seems to be pretty advanced in terms of sensor placement description which was not very well defined in the motion BEP :)

  • .json EMGPlacementScheme field : could be more restrictive with keywords?
    For instance in case of absence of a common process one MUST write "channel-specific".
    Keywords "visual reference", "palpation", "functional localization" ... can be explicitly recommended rather than having people use different keywords for describing the same thing (e.g., "visual inspection", "pressing on the skin"... ). They may even use multiple of those methods at the same time and in that case they can separate them with some designated delimiter (that can be prescribed too) for easy parsing. This depends of course on how well-categorized these processes are but since you are allowing unprescribed keywords for names of external schemes anyway (like SENIAM) it would be okay to not be comprehensive.

  • In the example on the website draft I read "EMGPlacementScheme": "midpoint
    between cubital fossa and radial styloid process", : this seems to contradict the description that says NOT to give the target muscle description

  • .json EMGReference : similarly to EMGPlacementScheme field, you may simply have them choose between 1) a specific name, 2) keyword "channel-specific", or 3) "bipolar". Mix of bipolar and other references would then be a case of "channel-specific".

  • .json SkinPreparation : might this be channel-specific as well? For instance in EEG we would use the abrasive gel only for EOG and not for other electrodes. Then having this as a column in channels.tsv with description of keywords in channels.json can be helpful

@drammock
Copy link
Author

Hi @sjeung, thanks for the feedback / ideas.

  • .json EMGPlacementScheme field : could be more restrictive with keywords?

done in e84cadc

In the example on the website draft I read "EMGPlacementScheme": "midpoint between cubital fossa and radial styloid process", : this seems to contradict the description that says NOT to give the target muscle description

Those are skeletal landmarks, not muscles. But we've reworked EMGPlacementScheme to be an enum now, so that example will need to change anyway.

.json EMGReference : similarly to EMGPlacementScheme field, you may simply have them choose between 1) a specific name, 2) keyword "channel-specific", or 3) "bipolar". Mix of bipolar and other references would then be a case of "channel-specific".

This was the intent, perhaps it's just not worded clearly enough? Suggestions for clarification are welcome.

.json SkinPreparation : might this be channel-specific as well?

For EEG, I think abrasive gel isn't used because of possible damage to hair. According to @neuromechanist it would be odd to use a different skin prep for different EMG sites in the same session, so we'll probably leave this as as-is.

@JuliusWelzel
Copy link
Collaborator

it is explained at the end of the initial "EMG Data" section (just before the "Terminology: electrodes vs channels" subsection). It comes up again when discussing coordsystems, and then again when discussing photos. I couldn't see a good way to avoid talking about it in multiple places. In light of that, do you still think it needs to move / change?

Good point, I think it can stay as it is. Maybe it is worth adding a detailed explanation for the reasoning in the paper.

Are you specifically asking to add the "sub-millisecond precision" bit? (if so, no objection). If not, can you clarify what you think is lacking here?

Yes, sub-millisecond presicion is imo worth mentioning as EMG usually has a high srate. This time resolution is important for good syncronization with other modalities.

what is MUST, SHOULD, or MAY is open to discussion. There are also likely some more rules to be added, e.g., to make some optional fields required depending on the values in other fields. Regarding specifically the Polhemus case, I would agree that digitized locations MUST include x,y,z based on my experience using Polhemus for digitizing EEG electrode locations. Is there a case where one would use a Polhemus (or similar spatial digitizer) and not provide coordinates in 3D?

I am not aware of any case where it is not provided in x,y,z.

EDF/BDF necessarily have channel names in the file (which I think is what you mean by "headers" right?). There are also guidelines on what the format of such channel names should look like (modality-space-identifier, i.e., EEG Cz or MEG 1441 or EMG 002). I suppose it would be conceivable to have an EDF/BDF file where the channel names were non-unique (which IMO would be a degenerate case), but I don't think they can be missing.

True, sorry. But maybe it can be pointed out, that the names in the 'channels.tsv' MUST match the names in the BDE/EDF file?

@JuliusWelzel
Copy link
Collaborator

IMHO, acq-<label> is more meaningful as it would indicate separate acquisitions.

Agreed, maybe a PR can be opened to extend the definition for the acq label as @drammock suggested?

  1. acq_time in the scans_tsv is quite clear IMO. We briefly discussed accommodating a LATENCY channel, if the data has multiple recordings. Probably, we should add it to the list of reserved channel types?

Good idea, I would be in favor off adding LATENCY to the channel types. The scans.tsv file will also be replace with a recordings.tsv file in BIDS 2.0.

Copy link
Member

@neuromechanist neuromechanist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I made it through once and made suggestions. The overall specifications look great. I look forward to the community comments.

@drammock
Copy link
Author

ping @robertoostenveld and @tjeerdboonstra. I think we're about ready to open this up to public comment; do you want a chance to go through it again first?

@drammock
Copy link
Author

maybe a PR can be opened to extend the definition for the acq label as @drammock suggested?

done in #2090

@yarikoptic
Copy link
Collaborator

sys entity feels analogous (so can replace or be replaced with) to

idea. So if to parallel exactly, should get systems.{json,tsv}? But then I would prefer devices.{json.tsv} as better descriptive since systems could be abstract ("coordinate system" etc).

@JuliusWelzel
Copy link
Collaborator

sys entity feels analogous (so can replace or be replaced with) to

idea. So if to parallel exactly, should get systems.{json,tsv}?

Yes! As far as I understand how devices should be used, this is what we wanted to achieve with the tracksys entity for MOTION-BIDS. In the Paper we define a "tracking-system" as:

We define a tracking system as a group of channels that synchronously sample motion data from one or multiple tracked points. To be grouped as a single tracking system, channels MUST share the core parameters of sampling (namely the sampling rate and the duration) as well as hardware and software properties, resulting in the same number of samples and, if available, a single latency channel associated with the rest of the channels.

This resulted in a REQUIRED tracksys-<label> per motion.tsv file.
I think it is important to specify if users MUST define the sys/acq/dev label or if this is optional. We made it required, even though, the majority of motion datasets records data using only a single device. Should BIDS 2.0 remove the tracksys label for the motion data and streamline with whatever is decided in this and similar BEPs?

But then I would prefer devices.{json.tsv} as better descriptive since systems could be abstract ("coordinate system" etc).

As for the terminology, adopting devices.{json.tsv} is preferable over systems.{json.tsv} to avoid confusion with other abstract concepts like coordinate systems. The term "devices" more accurately reflects the physical equipment used in data acquisition, leading to clearer documentation and understanding.

@drammock
Copy link
Author

The term "devices" more accurately reflects the physical equipment used in data acquisition, leading to clearer documentation and understanding.

agreed, dev / device is semantically a better entity name than acq (or sys or recording) for what we're grappling with in EMG.

Comment on lines +28 to +29
| [Biosemi data format](https://www.biosemi.com/faq/file_format.htm) | `.bdf` | Each recording consists of a single `.bdf` file. [`bdf+`](https://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html) files are permitted. The capital `.BDF` extension MUST NOT be used. |
| [European data format](https://www.edfplus.info/) | `.edf` | Each recording consists of a single `.edf` file. [`edf+`](https://www.edfplus.info/specs/edfplus.html) files are permitted. The capital `.EDF` extension MUST NOT be used. |
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, I am the software lead for Artinis Medical Systems (which has taken over TMSi). We are working on HD-EMG devices and would like to ask why you did not add the brainvision dataformat here. Apart from symmetry with the EEG standard, there would also be synergy in terms of open software toolboxes that already support BVCDF. Also, BVCDF supports 32-bit data, so you would make the standard just a tiny bit more future-proof (yes, bdf+ supports 24 bit, but why take away the extra precision if there is an already standardized dataformat that does support it?). I would appreciate if you can consider this request, and am open for discussion.

Copy link
Member

@neuromechanist neuromechanist Apr 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL, DR: IMO any open-source format with 1) a clear advantage over the current formats and/or 2) widespread support and use in the community (here EMG) should be considered. All the formats included in EEG-BIDS do satisfy at least one of these two arguments, but for EMG, I am not sure that BVCDF meets either. More below:


Hi, thanks very much for reaching out. We greatly appreciate your taking the time. We discussed this topic quite a bit, and I'll try to summarize and respond to the good points that you made.

Surveying the major EMG research instruments (EMG: Delsys, Noraxon, Cometa, hdEMG: OTB used in 70% of published research in 2022, see #1371 for more details), we found that almost all have their own and often proprietary data formats. We concluded that there isn't a clear dominating and open-source format for EMG data.

Most, if not all known EMG data examples we had access to were <16-bit format, except when an EEG instrument was used for EMG recording. Again pointers to all documents and discussions are available in the issue. Having access to 32-bit format is nice, but using that with the hardware that has 16-bit resolution only gives more room for noise, errors, and unnecessary computational overhead. I can imagine that once 32-bit eletrophys recording comes in to work, an extension to EDF for 32-bit recording would be very easy to make.

Symmetry with EEG format is a quite good argument. We tried our best to keep the symmetry as much as possible, pruning unnecessary parts, and adding necessary metadata. Specifically, for my day job at EEGLAB, we have received feedback that the proliferation of files in a BIDS directory would make their (visual) inspection cumbersome, especially once there are multiple tasks and runs per session and subject. We took a measure to remove .fdt files and now embed data in the .set files despite making the analysis pipelines a little more involved. EDF+ has rich metadata and event headers that can be used and may alleviate the need for the additional BVCDF files.

To my knowledge, open-source software that supports BVCDF also supports EDF, if not better. EDF read and write functions are native in MATLAB, there are at least a couple of actively-maintained Python packages as well as wrappers for other platforms. Also, from experience, EDF/BDF read, write, and storage are quite efficient using these toolboxes.

One pain point could be converting the EMG data files to EDF before or during the BIDS conversion process while preserving the metadata. I made EMGIO to help export the EMG data from different formats to EDF and preserve the metadata. The package is mostly a wrapper around PyEDFlib, with some bells and whistles like automatic determination of resolution, plotting, and trying to transfer relevant metadata to EDF. The package currently supports Delsys, OTB, SET and EDF import and EDF export. I'd be happy to work with you and others to support importing data from more instruments and even exporting to BVCDF if there is a demand for it.

I reference this conversation under #1371 to make sure it will stay for future reference once this PR is merged. Please feel free to continue the discussion here or there.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking the time to respond!

I understand your points from a current perspective. However, we are a company that wants to move the field forward, which means we want to improve on what is currently there. Potentially with 32-bit support, researchers gain several advantages. For example 32-bit measurements preserve motor unit recruitment hierarchies during dynamic tasks. It also reduces quantization-induced spatial aliasing in electrode arrays. Lastly it would provide future-proof data storage for advanced analysis techniques. We consider the last point especially crucial. While this extension is still in the drafting phase, I would therefore like to ask you to consider the request to add BVCDF once more, rather than just changing this in future once the draft has been finalized.

Note also that I am not affiliated with BrainProducts in any way. Instead I work for a competitor, and we still argue that a 32-bit dataformat would be beneficial for the (HD-)EMG community, and therefore advocating to support the BV dataformat. I hope you see that this request is grounded in spirit of scientific advancement and not of commercial interest.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with Jorn that BVCDF would be a valuable addition to the list of data formats for EMG.

  1. EDF is not a serious consideration for the data from many systems, as it is limited to 16 bit integers and most ADCs nowadays are more bits.
  2. BVCDF allows 32 bit integer and 32 bit float, thereby also extending beyond what BDF supports (24 bit integers).
  3. Writing BVCDF data is simple, whereas code for writing BDF is not available in MNE-Python or FieldTrip (but apparently is in EEGLAB, although I could not find the actual code that writes the 24-bit format).
  4. In earlier BIDS-EMG discussions we already identified that many research labs use EEG equipment and software to do EMG recordings rather than EMG-specific systems. I recall that we were able to find more publicly shared EMG datasets in BrainVision format than in any other format.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

furthermore, BVCDF is a formal open standard with well-defined governance, whereas the format status of BDF is not so clear (as it is not published).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In short, unnecessary computational overhead, lack of actual use and relevance in EMG, unnecessary phantom data files, and unfair manufacturer gain are the main arguments.

We discussed this point when another manufacturer lightly proposed adopting their format, and a senior researcher specifically persuaded the group to avoid this precedent. Adding BVCDF makes one manufacturer irrelevant to EMG to use EMG-BIDS for their press release while rejecting EMG-specific instruments. IMHO,this is not fair to the community or research.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm summarizing here a conversation that @neuromechanist and I had offline.

  1. It's not totally clear what the file type landscape is for exising public EMG datasets; @robertoostenveld indicated that there were more public EMG datasets in BVCDF than in other formats, but @neuromechanist is doubtful of this (majority are .csv or .hdf5 is his guess). We will compile a list of what we can find over the coming week, and see what formats are out there "in the wild".
  2. @Horschig has indicated an intention by Artinis/TMSi to bring 32-bit EMG hardware to market, and implies that BVCDF is an export format they are at least considering to offer in their software. Jörn, can you give any more concrete info? From the TMSi website, it looks like SAGA records 24-bit data but I couldn't find the bit depth info for SPIRE.
  3. Since BVCDF is controlled by one company (not an industry consortium or researcher-led spec), if we allow it we ought to also allow other company-controlled file formats as long as the format specification meets some criteria (at a minimum, open and versioned). My understanding is that there was some resistance to company-controlled file formats in early meetings about this BEP (which I wasn't present for, so I can't speak to). Is that resistance still present?
  4. A pragmatic reason to allow BVCDF is that it's already supported by the major tools in Python and MATLAB, so adding it is pretty low-cost for the tool maintainers.
  5. Adding read/write support for new file formats is a non-trivial amount of work for maintainers.
  6. Points 3 and 5 combined mean that allowing BVCDF opens the door to potentially a lot of extra work for the software tool maintainers.

My personal feeling is that the 80/20 rule would push toward not allowing BVCDF but that we should probably do so anyway because (a) it will handle existing datasets recorded on EEG hardware, (b) it's easy to do now, and (c) chances are we'll need a 32-bit capable format eventually. I also think to be fair we should reach out to other manufacturers to discuss adding their formats / encourage them to provide export for at least one of the formats we're allowing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just clarifying, the upcoming Spire will not have 32-bit EMG data channels. However, I see the potential coming for this, and be it mostly that hardware development has matured so much in these fields that an advancement in fidelity is the next logical thing to do.

Copy link
Member

@neuromechanist neuromechanist Apr 8, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. This Gist is a survey of >75 EMG-centric studies, primarily compiled by a dedicated EMG researcher, Rami Khushaba, see the original post on LinkedIn . I have verified the file formats for all of these studies, and the overwhelming majority are recorded in MAT and CSV/TXT formats. Notably, only one dataset is available in BVCDF format. I did not include the datasets I have personally worked on (approximately 10), which are also available in CSV and MAT formats.
  2. It is likely that there are additional EMG datasets in BVCDF format, such as this one, these datasets appear to be exceptional cases (<5% of the cases) and do not need 32-bit file format. Similarly, there are exceptional cases for recording EEG data using EDF, like these EEG-EMG datasets available in OpenNeuro: ds004840, and ds005873
  3. Based on the survey findings, it appears that EMG researchers actively use EMG equipment in their research endeavors, not EEG equipment.
  4. @Horschig, thank you for your enthusiasm in capturing muscular interactions with as much detail as possible. In my EMG file converter experiments, I found that most EMG data had a dynamic range under 70 dB. EDF offers about 90 dB, while BDF offers about 144 dB. So, (putting my hardware dev hat on) the current file format (and even ADCs) does not appear to hinder the capturing of EMG signals. There might be more sensitive electrodes and amplifiers on the horizon, but we are not there yet.
    Historically, transitioning to BDF for EEG was a logical development, as some experiments suggested EEG resolution may need around 20 bits (120 dB), and there were equipment and demand for it. If EMG benefits from higher resolution and dynamic range, a new format will likely emerge naturally. Extending EDF to higher resolutions involves changing a single parameter (Byte count) in specifications and im/exporters, and could be a good candidate for this evolution (in its due time).

The current EMG landscape suggests that EDF/BDF formats are more than adequate for describing EMG research now and in the near future. While adding support for BVCDF has minimal overhead, it could lead to a surge of company-sponsored file formats from specialized EMG equipment manufacturers. I don’t see the benefit of adding six or seven file formats, as it would increase maintenance and accessibility costs. Additionally, endorsing formats that imply vendor endorsement should be done cautiously and only as a last resort.

Copy link
Member

@neuromechanist neuromechanist Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, the team behind PhysioNet introduced the WaveForm DataBase (WFDB, https://wfdb.io) format in early 2000s (paper) that several of the shared EMG datasets already use (as they are hosted on PhysioNet), such as the Hyser, or multiday gesture datasets.

WFDB consists of a header .hea and data/signal .dat file, with clear specifications, converters, help, etc (although it seems to lack governance and contributing guides, see the WFDB org on GitHub). The data/signal file also supports upto 32-bit resolution with several different configurations (see the spec).

space: optional

# MEG has an additional entity available
electrodes__meg:
$ref: rules.files.raw.channels.electrodes
$ref: rules.files.raw.channels.electrodes__eeg
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too familiar with the BIDS standard, but does this here break backwards compatibility?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, the introduction of BIDS-EMG should not have consequences for BIDS-MEG

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I did things correctly, this will not change anything about BIDS-EEG or BIDS-MEG.

Previously there were rules for electrodes and electrodes__meg (which inherited from electrodes). Effectively I renamed electrodes to electrodes__eeg and inserted a new rule called electrodes that lacks the optional space entity (which isn't needed for EMG). So the new rule inheritance is:

electrodes   ->   electrodes__eeg   ->   electrodes__meg
(emg)             (eeg, ieeg)            (meg)
                  adds `space` entity    adds `processing` entity

Co-authored-by: Jörn M. Horschig <[email protected]>
Copy link
Collaborator

@oesteban oesteban left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Responding to the call for community feedback, I would like to express three main comments on this BEP that I believe require careful consideration:

  1. New Modality (_emg suffix): I question the necessity and practical implications of introducing a dedicated _emg modality, particularly given that similar physiological signals (e.g., eye-tracking) are effectively managed under _physio in ongoing BEP020 efforts. Introducing new modality suffixes can fragment BIDS, and a clear justification for why _physio is insufficient is necessary.

  2. Multiple Formats & Format Policy: The BEP proposes several new formats (EDF, BDF, etc.) without adequately justifying their necessity or clearly discussing why currently-supported (tsv) or proposed alternatives (Parquet) cannot be applied to EMG data. Drawing from past experiences (e.g., DICOM vs. NIfTI), I suggest moving toward a unified, vendor-neutral format that supports agile analytics and has an entirely open software stack. Additionally, general policy recommendations on adding new formats should be discussed separately and not within modality-specific BEPs.

  3. Electrode Placement Pictures (_photo.jpg): While I fully support adding experimental setting photographs, this should perhaps be addressed more broadly rather than within modality-specific BEPs.

Given the significant overlap with ongoing discussions and proposals in BEP020 (#1128), I strongly recommend aligning and coordinating with that proposal to avoid redundant effort and ensure consistency across BIDS extensions. (cc @CPernet, @effigies, @Remi-Gau)

Should I submit a PR with proposed changes (similar to BEP038)—which may not happen very soon as I'm buried under BEP020 and BEP038, or would the authors and BIDS maintainers rather discuss this feedback first?

Comment on lines +31 to +37
EDF, EDF+, BDF, and BDF+ are all open data formats with broad support in various programming languages for reading and writing the files. BDF and BDF+ formats store data samples using 3 bytes instead of 2 bytes as in EDF and EDF+ formats, allowing for greater resolution. EDF+/BDF+ accommodate more header metadata than EDF/BDF, and support storing event or annotation information in the file. Thus it is RECOMMENDED to use the BDF+ data format.
Future versions of BIDS may extend this list of supported file formats.
File formats for future consideration MUST have open access documentation, MUST have
open source implementation for both reading and writing in at least two programming
languages and SHOULD be widely supported in multiple software packages.
Other formats that may be considered in the future should have a clear added advantage
over the existing formats and should have wide adoption in the BIDS community.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines raise two important concerns that should be reconsidered:

1. Addition of Multiple Data Formats:

Proposing multiple new data formats (EDF, EDF+, BDF, BDF+) significantly increases complexity and entry barriers, particularly for users less familiar with these device-oriented formats. Each format introduced to BIDS must have an explicit and clear rationale explaining its unique benefits, along with a thorough justification for why existing BIDS-supported formats (such as TSV or compressed TSV) are inadequate. Evidence linked by @neuromechanist above seems to indicate that compressed TSV could be a pretty acceptable alternative. Ideally, consensus should be sought around adopting a single format rather than introducing multiple parallel alternatives, moreover in this case where there are incentives for vendors in entering format wars.

The neuroimaging community has previously encountered substantial issues due to similar fragmentation, with the DICOM format serving as a key example. DICOM is open, vendor-supported, and widely produced by imaging devices, yet it suffers from fragmentation caused by vendor-specific tags, alternative vendor-proprietary formats (e.g., Philips PAR/REC for MRI), and mixed-purpose content. This fragmentation undermines DICOM’s effectiveness as a universal standard and complicates downstream processing workflows. Additionally, DICOM is notoriously inconvenient for agile data analytics, lacking straightforward memory-mapped access, practical data compression options, and efficient parallelization.

In contrast, adopting NIfTI—a single, vendor-neutral, modality-agnostic format—allowed the neuroimaging community to streamline analyses, facilitate easier data sharing, ensure long-term stability, and promote community-driven improvements. NIfTI was not the only alternative to DICOM available at BIDS' onset, but offered a good balance between limitations and simplicity, openness, support, and adoption. On the side of "measurement" series (employing @bendichter's nomenclature), @effigies proposed to adopt a single vendor-independent, data-science-friendly format—Parquet (bids-standard/bids-specification#1792). Unified, open-source, and analytics-friendly formats significantly enhance BIDS' usability, reduce complexity, and enable robust community-driven maintenance.

Additionally, while some community members have pointed out the existence of open software libraries to read these modality-specific formats, there is an important caveat: can the community guarantee that the entire software stack required to read these formats remains fully open and transparent? For instance, in the eye-tracking domain, the widely used EyeLink's EDF format requires proprietary vendor-supplied libraries installed separately—even when accessed through seemingly open interfaces such as pyedfread. Such hidden dependencies substantially increase entry barriers, compromise transparency, and undermine the openness and sustainability goals central to the BIDS standard.

Given this historical lesson and current experiences from analogous scenarios, it would be prudent to avoid replicating fragmentation with device-oriented formats. Instead, placing format conversion explicitly within dedicated converter software and advocating for a single, open, and analytics-friendly modality-agnostic format would best serve the community.

2. Future Format Inclusion Policy:

Although the authors have carefully attempted to frame their recommendations on future file format inclusions within the context of this specific BEP, these policy-oriented statements implicitly touch upon broader BIDS-wide considerations. Guidelines on how BIDS evaluates and incorporates new formats fall within the purview of the wider BIDS community rather than any individual BEP. To maintain clear boundaries, I recommend that the authors refocus strictly on the modality-specific technical issues here, leaving general format inclusion policies to dedicated, community-wide discussions elsewhere.

This doesn't mean that BEPs cannot change BIDS-wide elements—if that's necessary, then it should be possible. However, a BEPs affecting BIDS across the board is likely to meet difficult issues along the way that make consensus harder. Should this BEP change or refine policy about addition of future formats, that should be done somewhere else and without scoping within the BEP at hand.

Copy link
Member

@neuromechanist neuromechanist Apr 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDF/BDF are not new formats; they have been introduced in EEG-BIDS (EDF/EDF+/BDF/BDF+) and iEEG-BIDS (EDF).

In several occasions and conversations, including a couple with @yarikoptic, we discussed how headless compressed tsv (TSV.GZ) poses several problems, both technical and also for FAIR use of data. I think, however, this can be discussed in another issue with a wider-reaching community.

IMO, specifications should provide an extensibility pathway. This extensibility should certainly be within the boundaries that the specification is defined. As you mentioned, this statement is under EMG-BIDS and is not meant to suggest any BIDS-wide elements and does not encourage anything outside the BIDS community.
Perhaps, the write-up should provide more clarification that this statement is only related to potential future format extensions for EMG-BIDS.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDF/EDF+ are supported in iEEG and EEG, and BDF/BDF+ are supported in EEG. These are thus not new formats being introduced to BIDS. I don't know if this affects your argument.

As to using parquet, I believe it is well-suited to data where every column is named, each column has a data type, and values may be missing. For a 2D array of floats, its features may not be compelling over simpler formats. (I don't know if EDF/BDF are simpler formats.)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if this affects your argument.

I guess it does partly. One aspect that should be considered is whether the software stacks of EDF/BDF/+ are fully open. Eye-trackers' EDF format (which is a different thing, I believe) requires a private library to access the data.

I think it also brings the broader discussion regarding BIDS' standpoint regarding formats. IMHO BIDS should be downstream-looking (e.g., parquet, NIfTI---meaning, formats that are designed to support processing and data science) as opposed to upstream-looking (e.g., DICOM---formats specifically designed to support devices' outputs that may not be so amenable to downstream processing). That said, I agree that would be mostly beyond the scope of this BEP (though it would require evaluation from the community before this BEP would move forward). Even if those formats were introduced previously, BIDS should try to get ahead of matters like this and make clear guidelines for BEP proponents (a bit like for the _photo files).

It does not affect the fact that these data must be represented with _physio today, and the argument that BEPs like this will void _physio of interest leaving eye-tracking as an island in the spec.

As to using parquet, I believe it is well-suited to data where every column is named, each column has a data type, and values may be missing. For a 2D array of floats, its features may not be compelling over simpler formats. (I don't know if EDF/BDF are simpler formats.)

This comes back to the argument of downstream-looking vs. upstream-looking. EDF/BDF seem to clearly fall in the upstream-looking definition while parquet would fall in downstream-looking.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, specifications should provide an extensibility pathway.

Please see my argument above about DICOM. I've seen very few standards more extensible than DICOM, and that extensibility is what has constrained the range of uses it has today and precluded its wider application downstream.

Parquet (and tsv) are formats that a master student with some knowledge of data science may know (or quickly learn for their applicability in every data science application beyond neuroimaging).

I think, however, this can be discussed in another issue with a wider-reaching community.

Exactly. My argument is that BIDS is currently giving poor support to BEP initiatives by avoiding this discussion. If we wait for this BEP (and other similar initiatives) to be accepted to then have the conversation, the discussion is going to be constrained within the range of options backward compatibility will allow. Instead, this should be first clarified within BIDS (not necessarily within the spec, perhaps this is more of a governance/guidelines issue).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed #2055 discusses conditions/requirements to include formats. The recommendation we have in EMG-BIDS is almost the verbatim text from iEEG-BIDS spec, and hopefully would not be needed once #2055 is closed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I noticed #2055 discusses conditions/requirements to include formats. The recommendation we have in EMG-BIDS is almost the verbatim text from iEEG-BIDS spec, and hopefully would not be needed once #2055 is closed.

@neuromechanist I'm not arguing that EDF/BDF/+ do not meet the criteria stated in #2055 (depending on your interpretation of "widely adopted" and how you measure the requirement of future support and documentation). My argument is that by sticking to those formats, the BIDS community will miss an important opportunity to adopt a format that actually can hit a compute node in a cluster or an instance in the cloud. Parquet is supported by Apache and the user-base, active development, tooling, documentation, and likelihood to be maintained for long cannot be compared to any of the purpose specific formats we have in BIDS.

While I understand the argument that "EEG and MEG did this", I don't think we should decide based just on that. Those extensions were added in other circumstances and the landscape was different, so it would not be surprising that, if Parquet (or a comparable solution) was included in BIDS, then many modalities across the board would adopt that (the same way that I think if NIfTI could be replaced by an HDF5- or zarr-based format, the number of BIDS datasets using NIfTI would fast decline over time).

Adopting a format that is not standard in the compute side will force analysis pipelines to start by a conversion into an "internal format" (or a standard format). When the community moves on to defining BIDS EMG Derivatives this problem will be hit head-on, and it's likely that BIDS Derivatives will favor the general-purpose data science format over the application-specific format.

For me, the format is not the most critical point of this BEP, although I see this discussion as a missed opportunity to push into BIDS something downstream-looking that eases adoption by data scientists and consumption by analysis tooling. The proposal of EDF/BDF/+ (or any similar format for the matter at hand) is made from a perspective of protecting the original data (which I understand) and from the perspective of pipeline writers who don't want to adopt new formats. While I understand that logic, I think BIDS converters are critical to maximize the usability of data, which are written by devices in formats created with a different mindset and vision from those of BIDS.

Comment on lines +16 to +24
{{ MACROS___make_filename_template(
"raw",
datatypes=["emg"],
suffixes=["emg", "events"])
}}

EMG device manufacturers use a variety of formats for storing raw data, and there is
no single standard that all researchers agree on. For BIDS, EMG data MUST be
stored in one of the following formats:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines propose introducing a new dedicated EMG modality (_emg suffix) without clearly justifying the necessity of such an addition. Currently, EMG data can be adequately represented using the existing _physio suffix, and the proposal does not (i) explicitly elaborate on why the existing _physio approach is insufficient or limiting, and (ii) explicitly states that EMG data shall not be stored within _physio files. The latter is particularly relevant because the current reading creates the problem that two researchers may encode the same data in two far different ways. Unfortunately, implementing the validation for such a constraint (that is, EMG data encoded as _physio raises a validation error) is really hard (if not impossible), which, for me, is a good reason to try to stick with _physio.

Introducing new modality suffixes, particularly ones not specifically representing brain recordings, sets a precedent that complicates the BIDS ecosystem and risks fragmenting the standard. A similar challenge was previously faced by BEP020 (eye tracking, #1128) , where initially a separate modality-specific suffix was proposed but later abandoned in favor of extending the existing _physio suffix. Specifically, current BEP020's proposal utilizes the _physio suffix combined with setting the metadata field "PhysioType": "eyetrack", unlocking structured sets of mandatory, recommended, and optional metadata fields, along with clearly defined data columns tailored explicitly to eye tracking.

An additional relevant feature proposed by BEP020 is the complementary _physioevents file. This new file type describes asynchronous, device-specific events that do not fit well into standard BIDS _events or _stim files. For example, eye trackers typically store messages about calibration procedures, status indicators, and device-specific annotations essential for correct interpretation. These _physioevents files are intentionally designed generically to support similar asynchronous events across different physiological modalities beyond eye tracking. This BEP workarounds this issue by just choosing some formats that have been deemed interesting after some (profound and comprehensive) discussion, that otherwise did not mention the limitations of current BIDS infrastructure. That way, instead of lifting general limitations of BIDS, the use of idiosyncratic formats resolves problems just for EMG, by creating a separate realm within the specification. For BIDS, this development model is unsustainable and IMHO should be avoided.

Given the extensive discussion and careful consideration behind BEP020—particularly the decision to separate device-specific complexity into dedicated metadata and events files while maintaining a single modality-agnostic data format—I strongly recommend aligning this EMG proposal similarly. It would be prudent to extend _physio by defining "PhysioType": "emg", along with EMG-specific metadata fields and data columns. Furthermore, prioritizing the completion and community acceptance of BEP020 first would provide clearer guidance, avoid redundancy, and ensure consistent handling across similar physiological modalities.

}
```

## Photos of the electrode positions (`*_photo.<extension>`)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel BIDS is particularly lacking in this type of information. I'd see equally relevant to show pictures of the whole experimental setting, so maybe this should be addressed in a different BEP that enables these pictures regardless of the modality? cc/ @effigies @Remi-Gau

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_photo.* is permitted in MEG, EEG, iEEG, or microscopy. I think it would be reasonable to make a proposal to generally support this in all datatypes, but I don't think EMG explicitly adding support here makes that job any easier or harder.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly, I think here the BEP authors are resolving a more general problem of BIDS, and they are following the precedent (MEG, EEG, iEEG). It would be more effective to address this generally than within each specific effort.

In other words, I'm not criticizing this proposal; I'm pointing out something we (all) should anticipate more broadly before other BEPs have to make explicit mention of experimental settings' pictures.

@neuromechanist
Copy link
Member

neuromechanist commented Apr 28, 2025

Thanks very much @oesteban for your thorough review, I greatly appreciate it.

  1. New Modality: .... new modality suffixes can fragment BIDS, and a clear justification for why _physio is insufficient is necessary.

Our proposal is not just for a new modality, rather a new data type as well. We followed the current specification format adopted by EEG-, iEEG-, and MEG-BIDS. Nor EEG/iEEG/MEG provide any justification why they should be their own modalitites/datatypes, neither, AFAIK, Physio provides any justification or clarification when to use _physio modality, rather than embedding the physiological data in other modalities (like EEG), as those modalities already accommodate including physiological data as separate channels within their data files. To be clear, I am not arguing against _physio.

This can be a broader conversation as to what are the thresholds of having a specific datatype and/or modality rather than an umbrella, which could end up in a new BEP.

As to why EMG should have its own modality and data type, and not fall under Physio, there have been discussions at #1371, as well as in-person meetings. Some that I remember on top of my head are:

  1. Physio does not currently have a data type (being addressed in BEP045, BEP about non-neuronal physiological (cardiac, respiratory, skin conductance, gastro, ...) data and physiological data derivatives #1675). Searching for the currently shared data, there is an abundance of EMG data shared as standalone datasets. There is also a considerable interest to analyze EMG data as a standalone data, suggesting that EMG can benefit from its own data type and modality.
  2. EMG data is often very high-dimensional, easily going >200 channels with 2 kHz+ sampling frequency. Using spreadsheet-style formats like TSV and compressed TSV for both long and wide data could be inefficient and (for the compressed format) not transparent. These constraints are less likely for other physiological data.
  3. EMG can be collected from any site, and can target one or multiple muscles. The current Physio spec as well as the proposed changes in BEP20 do not address electrode/sensor placement nor mapping signals to different muscles. These two features are mostly unique to EMG, and other physiological recording may not benefit from specific terms and standards used for EMG sensor placement and mapping.
  4. Current and emerging EMG research directly derives/estimates neural discharges from EMG signals. This might be a distinguishing factor compared to BEP045, which aims to address "non-neuronal physiological" data.
  1. Multiple Formats & Format Policy ...
    Additionally, general policy recommendations on adding new formats should be discussed separately and not within modality-specific BEPs.

EDF/+ is widely used and adopted data standard for physiological recordings. It also includes some necessary metadata such as channel names, sampling frequency, signal range, recording date, etc. The specifications as well as converters are open (see the discussion above for more details). BDF/+ is a simple extension of EDF in which the only change is that the data is being recorded in 24-bit resolution, rather than EDF's 16-bit resolution.
We are not providing any general policy recommendations. It is all within EMG-BIDS. Probably the language should be more specific.

  1. Electrode Placement Pictures (_photo.jpg):

Agreed. Photos are an efficient way to convey to a human reader how the system is set up and placed. However, it poses potential ethical risks and may not be as precise, accurate, and machine readable as sensor placement description in electrodes.tsv.

@oesteban
Copy link
Collaborator

EEG/iEEG/MEG provide any justification why they should be their own modalitites/datatypes, neither, AFAIK, Physio provides any justification or clarification when to use _physio modality, rather than embedding the physiological data in other modalities (like EEG), as those modalities already accommodate including physiological data as separate channels within their data files. To be clear, I am not arguing against _physio.

As I mentioned above, I don't advocate for having explicit explanations within the specs. However, the policy about what can derive a new datatype should be agreed upon before BEPs start sprawling the datatype level. While I certainly do not disqualify EMG as a neural signal, I think (i)EEG and MEG are brain signals, while EMG is generally not. To me, it makes sense those brain measurements have their own datatype directories and all other neuroscience-relevant data go within those directories (or beh/). Will elaborate more on this later.

This can be a broader conversation as to what are the thresholds of having a specific datatype and/or modality rather than an umbrella, which could end up in a new BEP

Exactly. I'm just arguing that we can't advance on this BEP (and any other BEP proposing new datatype folders) until we have had this conversation. Conversely, our approach in BEP020 does not require this conversation because it works on the foundation of _physio, which is already in the spec.

As to why EMG should have its own modality and data type, and not fall under Physio, there have been discussions at #1371, as well as in-person meetings.

Likewise, under the umbrella of BEP020, we had the very same conversation, but the outcome was different because the people involved in the conversation were different. Since the same conversation is being had in different contexts in parallel, this signifies a point where BIDS requires a general policy to be defined so that BEPs do not diverge and are consistent.

  1. Physio does not currently have a data type

Agreed. Two comments on this:

  • If the problem is that "Physio does not currently have a data type" this BEP does not resolve the problem, at most it only solves it for EMG.
  • In the context of BEP020, we agreed that eye-tracking only datasets (which do exist) would write the eye tracking recordings under the beh/ (behavioral) data type. I am not aware whether that would suffice for EMG and other physiological signals, but if not, the argument goes to the previous point.

Resolving the problem specifically for EMG (or for eye tracking, or for other non-brain recordings) perpetuates the issue as you first stated it and increases the fragmentation of the general spec.

  1. EMG data is often very high-dimensional, easily going >200 channels with 2 kHz+ sampling frequency. Using spreadsheet-style formats like TSV and compressed TSV for both long and wide data could be inefficient and (for the compressed format) not transparent. These constraints are less likely for other physiological data.

Eye-tracking is also high-dimensional and dense. TSV is definitely not the solution (current specs disallow it for _physio, btw), but TSVGZ does fit the bill. The argument that compression is not transparent, when contrasted with binary formats such as EDF/BDF does not hold for me. If we are going to use a binary format, then I'd advocate for something like Parquet (don't know much about it, but totally trust @effigies that it is a really good option).

This does not mention something that BEP020 does solve - when devices generate more than just data recordings (e.g., when they generate signals and status messages, etc.). EDF and BDF address this with the + version, which mixes up data and metadata together (something BIDS definitely would like to avoid). Instead, BEP020's _physioevents.[tsv|tsv.gz] files resolve this problem (for all physio recordings).

3. EMG can be collected from any site, and can target one or multiple muscles.

I did not criticize this part of the proposal and I think it is extremely valuable. My point is that all these specific metadata can be encoded nicely (and implemented in the BIDS Validator) following the approach of BEP020 and without discontinuing _physio.

4. Current and emerging EMG research directly derives/estimates neural discharges from EMG signals. This might be a distinguishing factor compared to BEP045, which aims to address "non-neuronal physiological" data.

While these signals are neural, it doesn't seem to me EMG records brain signals. This is why I see it best suited within _physio. Please note, this thinking SHOULD NOT undermine the representation of EMG data. I am positive that following BEP020's approach, all that this BEP proposal prescribes can be equally achieved.

We are not providing any general policy recommendations.

Yes, it is scoped within EMG, but there is language stating what formats could be added and what are the requirements. IMHO that language does not fit this (nor any other) BEP (with the exception of a specific BEP to establish these policies across the spec).

EDF/+ is widely used and adopted data standard for physiological recordings.

Sure, I'm not attacking the format---if you all experts decided in favor of them after such a comprehensive conversation as the one above, I'm absolutely convinced that the four EDF/BDF/+ are excellent formats. Please refer to my point on upstream-looking vs. downstream-looking formats above (#1998 (comment)). Please also note the above comment regarding metadata (intertwined within a single file in the case of the "plus" versions of EDF and BDF).

However, it poses potential ethical risks and may not be as precise, accurate, and machine readable as sensor placement description in electrodes.tsv.

Like above---from my ignorance, the proposal of electrodes.tsv seems necessary and critical for EMG data representation so I really trust BIDS is better with such a concept for EMG. That said, I still think those metadata would be more consistently implemented with the approach of BEP020.

@Horschig
Copy link

Hi all, contributing my thoughts to the file format discussion for BIDS-EMG.

While my primary research hasn't been solely focused on EMG, I bring experience working with data analysis across several related modalities (EEG, ECG, EOG, MEG, fMRI, and currently working in an fNIRS/ExG company). This gives me a fairly broad view of common practices and data handling needs in these domains, also from the perspective of other users, including those who are not tech-savvy as most of us are.

A core principle of BIDS is enhancing data sharing for the purpose of reuse and analysis. Therefore, the practical usability of the chosen format within the target community seems crucial. How easily can researchers integrate BIDS-compliant EMG data into their existing analysis workflows?

This brings me to the suggestion of compressed .tsv. From my perspective, this format doesn't seem to have established traction within the EMG research community or widespread support in commonly used analysis tools. In previous BIDS extensions (like BIDS-EEG), the selected formats (e.g., EDF, BrainVision) were largely chosen based on existing community adoption, open specifications, and tool support – prioritizing practicality.

Introducing a less common format like compressed .tsv would necessitate an extra data conversion step for many users. This requires developing and maintaining specific conversion tools, which can be a barrier, especially for researchers who aren't primarily software developers and rely on established toolboxes.

Conversely, focusing on file formats already prevalent in the EMG community, particularly those that are open and supported by major software packages (like FieldTrip, MNE, etc.), appears more aligned with BIDS' goal of reducing friction in data sharing and analysis.

While I appreciate the need to consider future-proof formats (and I have argued for adding the BV format elsewhere), the primary standard should arguably reflect what the community currently uses effectively.

Therefore, I think we should priorite the currently proposed data formats with demonstrable, widespread use and robust tool support within the EMG field to maximize the immediate utility and adoption of BIDS-EMG.

@neuromechanist
Copy link
Member

@oesteban, I moved the conversation regarding data type and modality to #2108, with a summary of what were discussed here. Please consider expanding the discussion there. I believe that this discussion is very important (and overdue), and deserves independent attention. I hope that the discussion results in a clear guidelines and policy that helps us toward transparent, and unambiguous data sharing 😊.

@oesteban
Copy link
Collaborator

I moved the conversation regarding data type and modality to #2108,

Thanks! I'll make sure to bring this thread to the upcoming BIDS maintainers meeting in the context of BEP020 and this one :)

Let's continue that conversation there.

@drammock
Copy link
Author

drammock commented May 1, 2025

Thanks all for the lively discussion. I'm going to try to summarize what I see as the points of contention, in hopes that it will move the discussion forward.

One point regarding photos seems to amount to "what you're doing here is fine, but we should have a broader discussion about photos too", so I won't comment further here. The other two points of contention:

  1. Should EMG be a separate datatype? or an emg modality under another datatype? or should it fall under physio modality?

    • current state of this PR proposes emg modality within emg datatype
    • if it's emg modality, presumably it goes under beh datatype (or physio, pending BEP045 landing first)
    • Discussion of general criteria for adding new datatypes/modalities has moved to Defining Criteria for Data Types and Modalities in BIDS #2108; was supposed to be discussed at 01 May maintainer meeting, but the notes suggest that didn't happen

    My opinion: we propose a new modality and datatype primarily because of the many parallels between EEG and EMG. The main differences are (1) is the electrode on the scalp or somewhere else on the body, (2) how do you describe sensor placement information, and (3) how to handle the electrode/channel distinction for "integrated bipolar" EMG devices. To my mind, point (1) is immaterial; I don't see the value in restricting "first-class citizens" (AKA datatypes) to only those measurements that target the brain. Points (2) and (3) also don't strike me as disqualifying for making EMG its own datatype; rather, they are nuances that need to be pondered and addressed, but are ultimately addressable within existing data structures (coordsystem.json, channels.tsv, etc).

  2. What filetype(s) are appropriate for EMG?

    • current state of this PR proposes EDF(+) and BDF(+)
    • if EMG data goes under physio modality (see above), then the choice is already made (tsv.gz)
    • Parquet has also been suggested as a possibility
    • There are questions about "extensibility" and whether the BEP should state criteria for future addition of other file formats

    My opinion: I am in favor of prioritizing the data formats that are already in use by the EMG community, and/or ones that are easy for them to adopt. Secondarily, I have a bias towards file formats that are already well-supported by the existing software tools. I also admit I have a bias against tsv.gz because it feels far too error-prone to store column names and data values in separate files.

    To me, these considerations point to EDF/BDF(+) (because according to @neuromechanist, many EMG device manufacturers already support EDF export), and also to the BrainVision data format (because according to @robertoostenveld there are existing EMG datasets in that format that were recorded on EEG equipment, and according to @Horschig some manufacturers soon will start using it for new data).

    I agree that Parquet is a "nice" data format, well-suited to dataframe-like structures, and easy for downstream data-science-type consumers to ingest. But Parquet is not currently supported in MNE-Python / MNE-BIDS, and the main Python tool for interacting with Parquet files carries a dependency on Pandas, which in MNE-Python we've been trying hard to avoid adding as a dependency for quite some time. This makes the path forward with Parquet a bit unclear, at least for the MNE maintainers. In contrast, we already support EDF and BrainVision formats, and BDF support is not very hard to add and will certainly be added if this BEP ends up including BDF as an allowed format.

    Regarding "extensibility", I think it's being used in two distinct senses in this discussion, that should be separated. One is in the discussion of DICOM, which IIUC allows extra arbitrary metadata fields in its header. EDF+/BDF+ are not like that. The other sense of "extensibility" refers to the BIDS standard itself, and I am quite content to remove the text regarding possible future supported file types and criteria for adopting them.

    One final point about file types regards "mix[ing] up data and metadata together (something BIDS definitely would like to avoid)." While it's true that EDF/BDF+ formats can contain metadata, the same is true of BrainVision (for EEG) and FIF (for MEG). But this doesn't prevent their separation! MNE-BIDS, for example, when writing out BIDS-compliant datasets, will record which channels are marked as "bad" in channels.tsv rather than marking those channels as bad in the FIF data structure itself. Similarly, on reading a BIDS-formatted dataset, it will use channels.tsv (not the FIF metadata) to determine which channels are "bad". A similar point holds for "annotations" in the file metadata and events.tsv. So in sum, I don't think the file format's ability to hold metadata should count as a strike against it; as long as the tools are doing their job when creating the BIDS dataset, the desired data/metadata separation can be achieved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.