-
Notifications
You must be signed in to change notification settings - Fork 179
[ENH] extension for electromyography (EMG) - BEP042 #1998
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cc @agramfort |
Hi, @neuromechanist pointed me to this PR and I would like to share some thoughts. This seems to be pretty advanced in terms of sensor placement description which was not very well defined in the motion BEP :)
|
Hi @sjeung, thanks for the feedback / ideas.
done in e84cadc
Those are skeletal landmarks, not muscles. But we've reworked EMGPlacementScheme to be an enum now, so that example will need to change anyway.
This was the intent, perhaps it's just not worded clearly enough? Suggestions for clarification are welcome.
For EEG, I think abrasive gel isn't used because of possible damage to hair. According to @neuromechanist it would be odd to use a different skin prep for different EMG sites in the same session, so we'll probably leave this as as-is. |
Good point, I think it can stay as it is. Maybe it is worth adding a detailed explanation for the reasoning in the paper.
Yes, sub-millisecond presicion is imo worth mentioning as EMG usually has a high srate. This time resolution is important for good syncronization with other modalities.
I am not aware of any case where it is not provided in x,y,z.
True, sorry. But maybe it can be pointed out, that the names in the 'channels.tsv' MUST match the names in the BDE/EDF file? |
Agreed, maybe a PR can be opened to extend the definition for the
Good idea, I would be in favor off adding LATENCY to the channel types. The scans.tsv file will also be replace with a recordings.tsv file in BIDS 2.0. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it through once and made suggestions. The overall specifications look great. I look forward to the community comments.
Co-authored-by: Seyed (Yahya) Shirazi <[email protected]>
ping @robertoostenveld and @tjeerdboonstra. I think we're about ready to open this up to public comment; do you want a chance to go through it again first? |
idea. So if to parallel exactly, should get |
Yes! As far as I understand how
This resulted in a REQUIRED
As for the terminology, adopting |
agreed, |
| [Biosemi data format](https://www.biosemi.com/faq/file_format.htm) | `.bdf` | Each recording consists of a single `.bdf` file. [`bdf+`](https://www.teuniz.net/edfbrowser/bdfplus%20format%20description.html) files are permitted. The capital `.BDF` extension MUST NOT be used. | | ||
| [European data format](https://www.edfplus.info/) | `.edf` | Each recording consists of a single `.edf` file. [`edf+`](https://www.edfplus.info/specs/edfplus.html) files are permitted. The capital `.EDF` extension MUST NOT be used. | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, I am the software lead for Artinis Medical Systems (which has taken over TMSi). We are working on HD-EMG devices and would like to ask why you did not add the brainvision dataformat here. Apart from symmetry with the EEG standard, there would also be synergy in terms of open software toolboxes that already support BVCDF. Also, BVCDF supports 32-bit data, so you would make the standard just a tiny bit more future-proof (yes, bdf+ supports 24 bit, but why take away the extra precision if there is an already standardized dataformat that does support it?). I would appreciate if you can consider this request, and am open for discussion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TL, DR: IMO any open-source format with 1) a clear advantage over the current formats and/or 2) widespread support and use in the community (here EMG) should be considered. All the formats included in EEG-BIDS do satisfy at least one of these two arguments, but for EMG, I am not sure that BVCDF meets either. More below:
Hi, thanks very much for reaching out. We greatly appreciate your taking the time. We discussed this topic quite a bit, and I'll try to summarize and respond to the good points that you made.
Surveying the major EMG research instruments (EMG: Delsys, Noraxon, Cometa, hdEMG: OTB used in 70% of published research in 2022, see #1371 for more details), we found that almost all have their own and often proprietary data formats. We concluded that there isn't a clear dominating and open-source format for EMG data.
Most, if not all known EMG data examples we had access to were <16-bit format, except when an EEG instrument was used for EMG recording. Again pointers to all documents and discussions are available in the issue. Having access to 32-bit format is nice, but using that with the hardware that has 16-bit resolution only gives more room for noise, errors, and unnecessary computational overhead. I can imagine that once 32-bit eletrophys recording comes in to work, an extension to EDF for 32-bit recording would be very easy to make.
Symmetry with EEG format is a quite good argument. We tried our best to keep the symmetry as much as possible, pruning unnecessary parts, and adding necessary metadata. Specifically, for my day job at EEGLAB, we have received feedback that the proliferation of files in a BIDS directory would make their (visual) inspection cumbersome, especially once there are multiple tasks and runs per session and subject. We took a measure to remove .fdt
files and now embed data in the .set
files despite making the analysis pipelines a little more involved. EDF+ has rich metadata and event headers that can be used and may alleviate the need for the additional BVCDF files.
To my knowledge, open-source software that supports BVCDF also supports EDF, if not better. EDF read and write functions are native in MATLAB, there are at least a couple of actively-maintained Python packages as well as wrappers for other platforms. Also, from experience, EDF/BDF read, write, and storage are quite efficient using these toolboxes.
One pain point could be converting the EMG data files to EDF before or during the BIDS conversion process while preserving the metadata. I made EMGIO to help export the EMG data from different formats to EDF and preserve the metadata. The package is mostly a wrapper around PyEDFlib, with some bells and whistles like automatic determination of resolution, plotting, and trying to transfer relevant metadata to EDF. The package currently supports Delsys, OTB, SET and EDF import and EDF export. I'd be happy to work with you and others to support importing data from more instruments and even exporting to BVCDF if there is a demand for it.
I reference this conversation under #1371 to make sure it will stay for future reference once this PR is merged. Please feel free to continue the discussion here or there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking the time to respond!
I understand your points from a current perspective. However, we are a company that wants to move the field forward, which means we want to improve on what is currently there. Potentially with 32-bit support, researchers gain several advantages. For example 32-bit measurements preserve motor unit recruitment hierarchies during dynamic tasks. It also reduces quantization-induced spatial aliasing in electrode arrays. Lastly it would provide future-proof data storage for advanced analysis techniques. We consider the last point especially crucial. While this extension is still in the drafting phase, I would therefore like to ask you to consider the request to add BVCDF once more, rather than just changing this in future once the draft has been finalized.
Note also that I am not affiliated with BrainProducts in any way. Instead I work for a competitor, and we still argue that a 32-bit dataformat would be beneficial for the (HD-)EMG community, and therefore advocating to support the BV dataformat. I hope you see that this request is grounded in spirit of scientific advancement and not of commercial interest.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with Jorn that BVCDF would be a valuable addition to the list of data formats for EMG.
- EDF is not a serious consideration for the data from many systems, as it is limited to 16 bit integers and most ADCs nowadays are more bits.
- BVCDF allows 32 bit integer and 32 bit float, thereby also extending beyond what BDF supports (24 bit integers).
- Writing BVCDF data is simple, whereas code for writing BDF is not available in MNE-Python or FieldTrip (but apparently is in EEGLAB, although I could not find the actual code that writes the 24-bit format).
- In earlier BIDS-EMG discussions we already identified that many research labs use EEG equipment and software to do EMG recordings rather than EMG-specific systems. I recall that we were able to find more publicly shared EMG datasets in BrainVision format than in any other format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
furthermore, BVCDF is a formal open standard with well-defined governance, whereas the format status of BDF is not so clear (as it is not published).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In short, unnecessary computational overhead, lack of actual use and relevance in EMG, unnecessary phantom data files, and unfair manufacturer gain are the main arguments.
We discussed this point when another manufacturer lightly proposed adopting their format, and a senior researcher specifically persuaded the group to avoid this precedent. Adding BVCDF makes one manufacturer irrelevant to EMG to use EMG-BIDS for their press release while rejecting EMG-specific instruments. IMHO,this is not fair to the community or research.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm summarizing here a conversation that @neuromechanist and I had offline.
- It's not totally clear what the file type landscape is for exising public EMG datasets; @robertoostenveld indicated that there were more public EMG datasets in BVCDF than in other formats, but @neuromechanist is doubtful of this (majority are
.csv
or.hdf5
is his guess). We will compile a list of what we can find over the coming week, and see what formats are out there "in the wild". - @Horschig has indicated an intention by Artinis/TMSi to bring 32-bit EMG hardware to market, and implies that BVCDF is an export format they are at least considering to offer in their software. Jörn, can you give any more concrete info? From the TMSi website, it looks like SAGA records 24-bit data but I couldn't find the bit depth info for SPIRE.
- Since BVCDF is controlled by one company (not an industry consortium or researcher-led spec), if we allow it we ought to also allow other company-controlled file formats as long as the format specification meets some criteria (at a minimum, open and versioned). My understanding is that there was some resistance to company-controlled file formats in early meetings about this BEP (which I wasn't present for, so I can't speak to). Is that resistance still present?
- A pragmatic reason to allow BVCDF is that it's already supported by the major tools in Python and MATLAB, so adding it is pretty low-cost for the tool maintainers.
- Adding read/write support for new file formats is a non-trivial amount of work for maintainers.
- Points 3 and 5 combined mean that allowing BVCDF opens the door to potentially a lot of extra work for the software tool maintainers.
My personal feeling is that the 80/20 rule would push toward not allowing BVCDF but that we should probably do so anyway because (a) it will handle existing datasets recorded on EEG hardware, (b) it's easy to do now, and (c) chances are we'll need a 32-bit capable format eventually. I also think to be fair we should reach out to other manufacturers to discuss adding their formats / encourage them to provide export for at least one of the formats we're allowing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just clarifying, the upcoming Spire will not have 32-bit EMG data channels. However, I see the potential coming for this, and be it mostly that hardware development has matured so much in these fields that an advancement in fidelity is the next logical thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- This Gist is a survey of >75 EMG-centric studies, primarily compiled by a dedicated EMG researcher, Rami Khushaba, see the original post on LinkedIn . I have verified the file formats for all of these studies, and the overwhelming majority are recorded in MAT and CSV/TXT formats. Notably, only one dataset is available in BVCDF format. I did not include the datasets I have personally worked on (approximately 10), which are also available in CSV and MAT formats.
- It is likely that there are additional EMG datasets in BVCDF format, such as this one, these datasets appear to be exceptional cases (<5% of the cases) and do not need 32-bit file format. Similarly, there are exceptional cases for recording EEG data using EDF, like these EEG-EMG datasets available in OpenNeuro: ds004840, and ds005873
- Based on the survey findings, it appears that EMG researchers actively use EMG equipment in their research endeavors, not EEG equipment.
- @Horschig, thank you for your enthusiasm in capturing muscular interactions with as much detail as possible. In my EMG file converter experiments, I found that most EMG data had a dynamic range under 70 dB. EDF offers about 90 dB, while BDF offers about 144 dB. So, (putting my hardware dev hat on) the current file format (and even ADCs) does not appear to hinder the capturing of EMG signals. There might be more sensitive electrodes and amplifiers on the horizon, but we are not there yet.
Historically, transitioning to BDF for EEG was a logical development, as some experiments suggested EEG resolution may need around 20 bits (120 dB), and there were equipment and demand for it. If EMG benefits from higher resolution and dynamic range, a new format will likely emerge naturally. Extending EDF to higher resolutions involves changing a single parameter (Byte count) in specifications and im/exporters, and could be a good candidate for this evolution (in its due time).
The current EMG landscape suggests that EDF/BDF formats are more than adequate for describing EMG research now and in the near future. While adding support for BVCDF has minimal overhead, it could lead to a surge of company-sponsored file formats from specialized EMG equipment manufacturers. I don’t see the benefit of adding six or seven file formats, as it would increase maintenance and accessibility costs. Additionally, endorsing formats that imply vendor endorsement should be done cautiously and only as a last resort.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, the team behind PhysioNet introduced the WaveForm DataBase (WFDB, https://wfdb.io) format in early 2000s (paper) that several of the shared EMG datasets already use (as they are hosted on PhysioNet), such as the Hyser, or multiday gesture datasets.
WFDB consists of a header .hea
and data/signal .dat
file, with clear specifications, converters, help, etc (although it seems to lack governance and contributing guides, see the WFDB org on GitHub). The data/signal file also supports upto 32-bit resolution with several different configurations (see the spec).
space: optional | ||
|
||
# MEG has an additional entity available | ||
electrodes__meg: | ||
$ref: rules.files.raw.channels.electrodes | ||
$ref: rules.files.raw.channels.electrodes__eeg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too familiar with the BIDS standard, but does this here break backwards compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, the introduction of BIDS-EMG should not have consequences for BIDS-MEG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I did things correctly, this will not change anything about BIDS-EEG or BIDS-MEG.
Previously there were rules for electrodes
and electrodes__meg
(which inherited from electrodes
). Effectively I renamed electrodes
to electrodes__eeg
and inserted a new rule called electrodes
that lacks the optional space
entity (which isn't needed for EMG). So the new rule inheritance is:
electrodes -> electrodes__eeg -> electrodes__meg
(emg) (eeg, ieeg) (meg)
adds `space` entity adds `processing` entity
Co-authored-by: Jörn M. Horschig <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Responding to the call for community feedback, I would like to express three main comments on this BEP that I believe require careful consideration:
-
New Modality (
_emg
suffix): I question the necessity and practical implications of introducing a dedicated_emg
modality, particularly given that similar physiological signals (e.g., eye-tracking) are effectively managed under_physio
in ongoing BEP020 efforts. Introducing new modality suffixes can fragment BIDS, and a clear justification for why_physio
is insufficient is necessary. -
Multiple Formats & Format Policy: The BEP proposes several new formats (EDF, BDF, etc.) without adequately justifying their necessity or clearly discussing why currently-supported (tsv) or proposed alternatives (Parquet) cannot be applied to EMG data. Drawing from past experiences (e.g., DICOM vs. NIfTI), I suggest moving toward a unified, vendor-neutral format that supports agile analytics and has an entirely open software stack. Additionally, general policy recommendations on adding new formats should be discussed separately and not within modality-specific BEPs.
-
Electrode Placement Pictures (
_photo.jpg
): While I fully support adding experimental setting photographs, this should perhaps be addressed more broadly rather than within modality-specific BEPs.
Given the significant overlap with ongoing discussions and proposals in BEP020 (#1128), I strongly recommend aligning and coordinating with that proposal to avoid redundant effort and ensure consistency across BIDS extensions. (cc @CPernet, @effigies, @Remi-Gau)
Should I submit a PR with proposed changes (similar to BEP038)—which may not happen very soon as I'm buried under BEP020 and BEP038, or would the authors and BIDS maintainers rather discuss this feedback first?
EDF, EDF+, BDF, and BDF+ are all open data formats with broad support in various programming languages for reading and writing the files. BDF and BDF+ formats store data samples using 3 bytes instead of 2 bytes as in EDF and EDF+ formats, allowing for greater resolution. EDF+/BDF+ accommodate more header metadata than EDF/BDF, and support storing event or annotation information in the file. Thus it is RECOMMENDED to use the BDF+ data format. | ||
Future versions of BIDS may extend this list of supported file formats. | ||
File formats for future consideration MUST have open access documentation, MUST have | ||
open source implementation for both reading and writing in at least two programming | ||
languages and SHOULD be widely supported in multiple software packages. | ||
Other formats that may be considered in the future should have a clear added advantage | ||
over the existing formats and should have wide adoption in the BIDS community. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines raise two important concerns that should be reconsidered:
1. Addition of Multiple Data Formats:
Proposing multiple new data formats (EDF, EDF+, BDF, BDF+) significantly increases complexity and entry barriers, particularly for users less familiar with these device-oriented formats. Each format introduced to BIDS must have an explicit and clear rationale explaining its unique benefits, along with a thorough justification for why existing BIDS-supported formats (such as TSV or compressed TSV) are inadequate. Evidence linked by @neuromechanist above seems to indicate that compressed TSV could be a pretty acceptable alternative. Ideally, consensus should be sought around adopting a single format rather than introducing multiple parallel alternatives, moreover in this case where there are incentives for vendors in entering format wars.
The neuroimaging community has previously encountered substantial issues due to similar fragmentation, with the DICOM format serving as a key example. DICOM is open, vendor-supported, and widely produced by imaging devices, yet it suffers from fragmentation caused by vendor-specific tags, alternative vendor-proprietary formats (e.g., Philips PAR/REC for MRI), and mixed-purpose content. This fragmentation undermines DICOM’s effectiveness as a universal standard and complicates downstream processing workflows. Additionally, DICOM is notoriously inconvenient for agile data analytics, lacking straightforward memory-mapped access, practical data compression options, and efficient parallelization.
In contrast, adopting NIfTI—a single, vendor-neutral, modality-agnostic format—allowed the neuroimaging community to streamline analyses, facilitate easier data sharing, ensure long-term stability, and promote community-driven improvements. NIfTI was not the only alternative to DICOM available at BIDS' onset, but offered a good balance between limitations and simplicity, openness, support, and adoption. On the side of "measurement" series (employing @bendichter's nomenclature), @effigies proposed to adopt a single vendor-independent, data-science-friendly format—Parquet (bids-standard/bids-specification#1792). Unified, open-source, and analytics-friendly formats significantly enhance BIDS' usability, reduce complexity, and enable robust community-driven maintenance.
Additionally, while some community members have pointed out the existence of open software libraries to read these modality-specific formats, there is an important caveat: can the community guarantee that the entire software stack required to read these formats remains fully open and transparent? For instance, in the eye-tracking domain, the widely used EyeLink's EDF format requires proprietary vendor-supplied libraries installed separately—even when accessed through seemingly open interfaces such as pyedfread. Such hidden dependencies substantially increase entry barriers, compromise transparency, and undermine the openness and sustainability goals central to the BIDS standard.
Given this historical lesson and current experiences from analogous scenarios, it would be prudent to avoid replicating fragmentation with device-oriented formats. Instead, placing format conversion explicitly within dedicated converter software and advocating for a single, open, and analytics-friendly modality-agnostic format would best serve the community.
2. Future Format Inclusion Policy:
Although the authors have carefully attempted to frame their recommendations on future file format inclusions within the context of this specific BEP, these policy-oriented statements implicitly touch upon broader BIDS-wide considerations. Guidelines on how BIDS evaluates and incorporates new formats fall within the purview of the wider BIDS community rather than any individual BEP. To maintain clear boundaries, I recommend that the authors refocus strictly on the modality-specific technical issues here, leaving general format inclusion policies to dedicated, community-wide discussions elsewhere.
This doesn't mean that BEPs cannot change BIDS-wide elements—if that's necessary, then it should be possible. However, a BEPs affecting BIDS across the board is likely to meet difficult issues along the way that make consensus harder. Should this BEP change or refine policy about addition of future formats, that should be done somewhere else and without scoping within the BEP at hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDF/BDF are not new formats; they have been introduced in EEG-BIDS (EDF/EDF+/BDF/BDF+) and iEEG-BIDS (EDF).
In several occasions and conversations, including a couple with @yarikoptic, we discussed how headless compressed tsv (TSV.GZ) poses several problems, both technical and also for FAIR use of data. I think, however, this can be discussed in another issue with a wider-reaching community.
IMO, specifications should provide an extensibility pathway. This extensibility should certainly be within the boundaries that the specification is defined. As you mentioned, this statement is under EMG-BIDS and is not meant to suggest any BIDS-wide elements and does not encourage anything outside the BIDS community.
Perhaps, the write-up should provide more clarification that this statement is only related to potential future format extensions for EMG-BIDS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EDF/EDF+ are supported in iEEG and EEG, and BDF/BDF+ are supported in EEG. These are thus not new formats being introduced to BIDS. I don't know if this affects your argument.
As to using parquet, I believe it is well-suited to data where every column is named, each column has a data type, and values may be missing. For a 2D array of floats, its features may not be compelling over simpler formats. (I don't know if EDF/BDF are simpler formats.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know if this affects your argument.
I guess it does partly. One aspect that should be considered is whether the software stacks of EDF/BDF/+ are fully open. Eye-trackers' EDF format (which is a different thing, I believe) requires a private library to access the data.
I think it also brings the broader discussion regarding BIDS' standpoint regarding formats. IMHO BIDS should be downstream-looking (e.g., parquet, NIfTI---meaning, formats that are designed to support processing and data science) as opposed to upstream-looking (e.g., DICOM---formats specifically designed to support devices' outputs that may not be so amenable to downstream processing). That said, I agree that would be mostly beyond the scope of this BEP (though it would require evaluation from the community before this BEP would move forward). Even if those formats were introduced previously, BIDS should try to get ahead of matters like this and make clear guidelines for BEP proponents (a bit like for the _photo files).
It does not affect the fact that these data must be represented with _physio today, and the argument that BEPs like this will void _physio of interest leaving eye-tracking as an island in the spec.
As to using parquet, I believe it is well-suited to data where every column is named, each column has a data type, and values may be missing. For a 2D array of floats, its features may not be compelling over simpler formats. (I don't know if EDF/BDF are simpler formats.)
This comes back to the argument of downstream-looking vs. upstream-looking. EDF/BDF seem to clearly fall in the upstream-looking definition while parquet would fall in downstream-looking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, specifications should provide an extensibility pathway.
Please see my argument above about DICOM. I've seen very few standards more extensible than DICOM, and that extensibility is what has constrained the range of uses it has today and precluded its wider application downstream.
Parquet (and tsv) are formats that a master student with some knowledge of data science may know (or quickly learn for their applicability in every data science application beyond neuroimaging).
I think, however, this can be discussed in another issue with a wider-reaching community.
Exactly. My argument is that BIDS is currently giving poor support to BEP initiatives by avoiding this discussion. If we wait for this BEP (and other similar initiatives) to be accepted to then have the conversation, the discussion is going to be constrained within the range of options backward compatibility will allow. Instead, this should be first clarified within BIDS (not necessarily within the spec, perhaps this is more of a governance/guidelines issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed #2055 discusses conditions/requirements to include formats. The recommendation we have in EMG-BIDS is almost the verbatim text from iEEG-BIDS spec, and hopefully would not be needed once #2055 is closed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed #2055 discusses conditions/requirements to include formats. The recommendation we have in EMG-BIDS is almost the verbatim text from iEEG-BIDS spec, and hopefully would not be needed once #2055 is closed.
@neuromechanist I'm not arguing that EDF/BDF/+ do not meet the criteria stated in #2055 (depending on your interpretation of "widely adopted" and how you measure the requirement of future support and documentation). My argument is that by sticking to those formats, the BIDS community will miss an important opportunity to adopt a format that actually can hit a compute node in a cluster or an instance in the cloud. Parquet is supported by Apache and the user-base, active development, tooling, documentation, and likelihood to be maintained for long cannot be compared to any of the purpose specific formats we have in BIDS.
While I understand the argument that "EEG and MEG did this", I don't think we should decide based just on that. Those extensions were added in other circumstances and the landscape was different, so it would not be surprising that, if Parquet (or a comparable solution) was included in BIDS, then many modalities across the board would adopt that (the same way that I think if NIfTI could be replaced by an HDF5- or zarr-based format, the number of BIDS datasets using NIfTI would fast decline over time).
Adopting a format that is not standard in the compute side will force analysis pipelines to start by a conversion into an "internal format" (or a standard format). When the community moves on to defining BIDS EMG Derivatives this problem will be hit head-on, and it's likely that BIDS Derivatives will favor the general-purpose data science format over the application-specific format.
For me, the format is not the most critical point of this BEP, although I see this discussion as a missed opportunity to push into BIDS something downstream-looking that eases adoption by data scientists and consumption by analysis tooling. The proposal of EDF/BDF/+ (or any similar format for the matter at hand) is made from a perspective of protecting the original data (which I understand) and from the perspective of pipeline writers who don't want to adopt new formats. While I understand that logic, I think BIDS converters are critical to maximize the usability of data, which are written by devices in formats created with a different mindset and vision from those of BIDS.
{{ MACROS___make_filename_template( | ||
"raw", | ||
datatypes=["emg"], | ||
suffixes=["emg", "events"]) | ||
}} | ||
|
||
EMG device manufacturers use a variety of formats for storing raw data, and there is | ||
no single standard that all researchers agree on. For BIDS, EMG data MUST be | ||
stored in one of the following formats: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These lines propose introducing a new dedicated EMG modality (_emg
suffix) without clearly justifying the necessity of such an addition. Currently, EMG data can be adequately represented using the existing _physio
suffix, and the proposal does not (i) explicitly elaborate on why the existing _physio
approach is insufficient or limiting, and (ii) explicitly states that EMG data shall not be stored within _physio
files. The latter is particularly relevant because the current reading creates the problem that two researchers may encode the same data in two far different ways. Unfortunately, implementing the validation for such a constraint (that is, EMG data encoded as _physio
raises a validation error) is really hard (if not impossible), which, for me, is a good reason to try to stick with _physio
.
Introducing new modality suffixes, particularly ones not specifically representing brain recordings, sets a precedent that complicates the BIDS ecosystem and risks fragmenting the standard. A similar challenge was previously faced by BEP020 (eye tracking, #1128) , where initially a separate modality-specific suffix was proposed but later abandoned in favor of extending the existing _physio
suffix. Specifically, current BEP020's proposal utilizes the _physio
suffix combined with setting the metadata field "PhysioType": "eyetrack"
, unlocking structured sets of mandatory, recommended, and optional metadata fields, along with clearly defined data columns tailored explicitly to eye tracking.
An additional relevant feature proposed by BEP020 is the complementary _physioevents
file. This new file type describes asynchronous, device-specific events that do not fit well into standard BIDS _events
or _stim
files. For example, eye trackers typically store messages about calibration procedures, status indicators, and device-specific annotations essential for correct interpretation. These _physioevents
files are intentionally designed generically to support similar asynchronous events across different physiological modalities beyond eye tracking. This BEP workarounds this issue by just choosing some formats that have been deemed interesting after some (profound and comprehensive) discussion, that otherwise did not mention the limitations of current BIDS infrastructure. That way, instead of lifting general limitations of BIDS, the use of idiosyncratic formats resolves problems just for EMG, by creating a separate realm within the specification. For BIDS, this development model is unsustainable and IMHO should be avoided.
Given the extensive discussion and careful consideration behind BEP020—particularly the decision to separate device-specific complexity into dedicated metadata and events files while maintaining a single modality-agnostic data format—I strongly recommend aligning this EMG proposal similarly. It would be prudent to extend _physio
by defining "PhysioType": "emg"
, along with EMG-specific metadata fields and data columns. Furthermore, prioritizing the completion and community acceptance of BEP020 first would provide clearer guidance, avoid redundancy, and ensure consistent handling across similar physiological modalities.
} | ||
``` | ||
|
||
## Photos of the electrode positions (`*_photo.<extension>`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_photo.*
is permitted in MEG, EEG, iEEG, or microscopy. I think it would be reasonable to make a proposal to generally support this in all datatypes, but I don't think EMG explicitly adding support here makes that job any easier or harder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly, I think here the BEP authors are resolving a more general problem of BIDS, and they are following the precedent (MEG, EEG, iEEG). It would be more effective to address this generally than within each specific effort.
In other words, I'm not criticizing this proposal; I'm pointing out something we (all) should anticipate more broadly before other BEPs have to make explicit mention of experimental settings' pictures.
Thanks very much @oesteban for your thorough review, I greatly appreciate it.
Our proposal is not just for a new modality, rather a new data type as well. We followed the current specification format adopted by EEG-, iEEG-, and MEG-BIDS. Nor EEG/iEEG/MEG provide any justification why they should be their own modalitites/datatypes, neither, AFAIK, Physio provides any justification or clarification when to use This can be a broader conversation as to what are the thresholds of having a specific datatype and/or modality rather than an umbrella, which could end up in a new BEP. As to why EMG should have its own modality and data type, and not fall under Physio, there have been discussions at #1371, as well as in-person meetings. Some that I remember on top of my head are:
EDF/+ is widely used and adopted data standard for physiological recordings. It also includes some necessary metadata such as channel names, sampling frequency, signal range, recording date, etc. The specifications as well as converters are open (see the discussion above for more details). BDF/+ is a simple extension of EDF in which the only change is that the data is being recorded in 24-bit resolution, rather than EDF's 16-bit resolution.
Agreed. Photos are an efficient way to convey to a human reader how the system is set up and placed. However, it poses potential ethical risks and may not be as precise, accurate, and machine readable as sensor placement description in |
As I mentioned above, I don't advocate for having explicit explanations within the specs. However, the policy about what can derive a new datatype should be agreed upon before BEPs start sprawling the datatype level. While I certainly do not disqualify EMG as a neural signal, I think (i)EEG and MEG are brain signals, while EMG is generally not. To me, it makes sense those brain measurements have their own datatype directories and all other neuroscience-relevant data go within those directories (or
Exactly. I'm just arguing that we can't advance on this BEP (and any other BEP proposing new datatype folders) until we have had this conversation. Conversely, our approach in BEP020 does not require this conversation because it works on the foundation of
Likewise, under the umbrella of BEP020, we had the very same conversation, but the outcome was different because the people involved in the conversation were different. Since the same conversation is being had in different contexts in parallel, this signifies a point where BIDS requires a general policy to be defined so that BEPs do not diverge and are consistent.
Agreed. Two comments on this:
Resolving the problem specifically for EMG (or for eye tracking, or for other non-brain recordings) perpetuates the issue as you first stated it and increases the fragmentation of the general spec.
Eye-tracking is also high-dimensional and dense. TSV is definitely not the solution (current specs disallow it for _physio, btw), but TSVGZ does fit the bill. The argument that compression is not transparent, when contrasted with binary formats such as EDF/BDF does not hold for me. If we are going to use a binary format, then I'd advocate for something like Parquet (don't know much about it, but totally trust @effigies that it is a really good option). This does not mention something that BEP020 does solve - when devices generate more than just data recordings (e.g., when they generate signals and status messages, etc.). EDF and BDF address this with the + version, which mixes up data and metadata together (something BIDS definitely would like to avoid). Instead, BEP020's
I did not criticize this part of the proposal and I think it is extremely valuable. My point is that all these specific metadata can be encoded nicely (and implemented in the BIDS Validator) following the approach of BEP020 and without discontinuing
While these signals are neural, it doesn't seem to me EMG records brain signals. This is why I see it best suited within
Yes, it is scoped within EMG, but there is language stating what formats could be added and what are the requirements. IMHO that language does not fit this (nor any other) BEP (with the exception of a specific BEP to establish these policies across the spec).
Sure, I'm not attacking the format---if you all experts decided in favor of them after such a comprehensive conversation as the one above, I'm absolutely convinced that the four EDF/BDF/+ are excellent formats. Please refer to my point on upstream-looking vs. downstream-looking formats above (#1998 (comment)). Please also note the above comment regarding metadata (intertwined within a single file in the case of the "plus" versions of EDF and BDF).
Like above---from my ignorance, the proposal of |
Hi all, contributing my thoughts to the file format discussion for BIDS-EMG. While my primary research hasn't been solely focused on EMG, I bring experience working with data analysis across several related modalities (EEG, ECG, EOG, MEG, fMRI, and currently working in an fNIRS/ExG company). This gives me a fairly broad view of common practices and data handling needs in these domains, also from the perspective of other users, including those who are not tech-savvy as most of us are. A core principle of BIDS is enhancing data sharing for the purpose of reuse and analysis. Therefore, the practical usability of the chosen format within the target community seems crucial. How easily can researchers integrate BIDS-compliant EMG data into their existing analysis workflows? This brings me to the suggestion of compressed .tsv. From my perspective, this format doesn't seem to have established traction within the EMG research community or widespread support in commonly used analysis tools. In previous BIDS extensions (like BIDS-EEG), the selected formats (e.g., EDF, BrainVision) were largely chosen based on existing community adoption, open specifications, and tool support – prioritizing practicality. Introducing a less common format like compressed .tsv would necessitate an extra data conversion step for many users. This requires developing and maintaining specific conversion tools, which can be a barrier, especially for researchers who aren't primarily software developers and rely on established toolboxes. Conversely, focusing on file formats already prevalent in the EMG community, particularly those that are open and supported by major software packages (like FieldTrip, MNE, etc.), appears more aligned with BIDS' goal of reducing friction in data sharing and analysis. While I appreciate the need to consider future-proof formats (and I have argued for adding the BV format elsewhere), the primary standard should arguably reflect what the community currently uses effectively. Therefore, I think we should priorite the currently proposed data formats with demonstrable, widespread use and robust tool support within the EMG field to maximize the immediate utility and adoption of BIDS-EMG. |
@oesteban, I moved the conversation regarding data type and modality to #2108, with a summary of what were discussed here. Please consider expanding the discussion there. I believe that this discussion is very important (and overdue), and deserves independent attention. I hope that the discussion results in a clear guidelines and policy that helps us toward transparent, and unambiguous data sharing 😊. |
Thanks! I'll make sure to bring this thread to the upcoming BIDS maintainers meeting in the context of BEP020 and this one :) Let's continue that conversation there. |
Thanks all for the lively discussion. I'm going to try to summarize what I see as the points of contention, in hopes that it will move the discussion forward. One point regarding photos seems to amount to "what you're doing here is fine, but we should have a broader discussion about photos too", so I won't comment further here. The other two points of contention:
|
This is a very early WIP implementation to add EMG support. CIs are not expected to pass yet.
cc @neuromechanist @jwelzel @larsoner @arnodelorme @robertoostenveld feel free to push directly to this branch, I'll add you as repo collaborators on my fork
Note
We meet regularly to discuss this BEP
Next meeting: 18 Dec 2024 on https://ucsd.zoom.us/j/96433382377
Communication channel on github repo / matrix / slack / discord : #1371