Skip to content

Refactor _handle_events_reading to allow extracting annotation information stand-alone #1389

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 21 commits into from
Apr 16, 2025

Conversation

matthiasdold
Copy link
Contributor

@matthiasdold matthiasdold commented Apr 3, 2025

PR Description

Aim:
The _handle_events_reading in read.py reads onsets, duration and descriptions/event_ids from the events.tsv, then filters the columns for n\a values etc. and finally sets annotations to the raw object based on these filtered version of the events.tsv data.

    annot_from_raw = raw.annotations.copy()
    annot_from_events = mne.Annotations(onset=ons, duration=durs, description=descrs)
    raw.set_annotations(annot_from_events)

On default, this will encode information from three columns within the events.tsv.

In order to extract additional information from BIDS dataset within MOABB it would be necessary to ensure that all annotations that end up in the raw and whatever is extracted from the events.tsv are aligned. To allow for this, it would be necessary to have any filtering that happens in L526-602 accessible standalone (without the need to provide a raw).

Proposition:
Simply have the first part of the _handle_events_reading as its own function which returns the filtered version of the onset, description, duration and event_ids. Assuming that these columns would form a primary key within the events.tsv, we could always map additional meta info to the annotations in the raw unambiguously.

Merge checklist

Maintainer, please confirm the following before merging.
If applicable:

  • All comments are resolved
  • This is not your own PR
  • All CIs are happy
  • PR title starts with [MRG]
  • whats_new.rst is updated
  • New contributors have been added to CITATION.cff
  • PR description includes phrase "closes <#issue-number>"

Copy link

welcome bot commented Apr 3, 2025

Hello! 👋 Thanks for opening your first pull request here!
Please read the contributor guide, and please follow the steps outlined in the "Instructions for first-time contributors" section therein. ❤️ We will try to get back to you soon. 🚴🏽‍♂️

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @matthiasdold to make this function public, it would need to get a more extensive docstr (numpydoc), and preferably also be used in a public example.

You may (or not) also need to add it here in some way: https://github.com/mne-tools/mne-bids/blob/main/mne_bids/__init__.py

There are also some test failures here that I haven't looked into yet.

@matthiasdold
Copy link
Contributor Author

matthiasdold commented Apr 4, 2025

Hi @matthiasdold to make this function public, it would need to get a more extensive docstr (numpydoc), and preferably also be used in a public example.

You may (or not) also need to add it here in some way: https://github.com/mne-tools/mne-bids/blob/main/mne_bids/__init__.py

There are also some test failures here that I haven't looked into yet.

Hi @sappelhoff, thanks for checking this out.
I actually think it could be kept private / _events_file_to_annotation_kwargs function as in the first commit. The reason why I was later putting it to public was that I could not get the docs to build and making it public + introducing it to the __init__.py at least had sphynx find it.

Unfortunately, I still cannot get the docs to build, neither locally nor here on the server. Which is very puzzling to me, as the change really is just separating the existing functions... I will have another try later this afternoon, but any hints would be much appreciated.

Nevermind the above, you comment explains everything actually. I just did not know how Sphinx works properly.

It is working now, but coverage goes down at bit. Would you want me to implement a test for this explicitly?

Also, I had to add the html-noplot back to the Makefile under /doc to build locally, as suggested in the CONTRIBUTING.md.

Copy link

codecov bot commented Apr 4, 2025

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 97.49%. Comparing base (5d4320f) to head (6559e55).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1389      +/-   ##
==========================================
+ Coverage   97.48%   97.49%   +0.01%     
==========================================
  Files          40       40              
  Lines        9023     9060      +37     
==========================================
+ Hits         8796     8833      +37     
  Misses        227      227              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it would be great if we could cover any differences introduced via this PR.

So you ended up leaving the function private 🤔 I think that if you want to use this in another library, it may be worth making it public. For that you could take one example function in mne-bids that is public, and see where it has to be declared, and then do the same for your function.

Otherwise, you always take the risk that is inherent in using private functions of another software package: backward incompatible changes may come at any time and without notice 🤔

If you decide that you do NOT want to make the function public and that you DO want to take the risk of using a private function, then you would at least have to adjust your changelog entry, as we usually do not talk about changes to private functions so explicitly. You could write something a bit more generic, like "refactoring of events filtering", under code health or so.

WDYT @drammock @hoechenberger ?

@drammock
Copy link
Member

drammock commented Apr 7, 2025

WDYT @drammock

I wasn't super familiar with the internals of what MNE-BIDS did in terms of reading events.tsv prior to looking at this PR. My feeling (from a quick skim and quick think) is that:

  1. there should be a public func that reads events.tsv files.
  2. I'm conflicted about whether it should return an actual annotations object, a dict suitable for passing to mne.Annotations, or a dataframe. I guess the Annotations and the DataFrame can be quite easily constructed from the dict, so I guess the dict is most general/flexible.
  3. for users who want an events array, they can use mne.events_from_annotations and we should mention that in the docstring.
  4. The changelog should only mention this new public function; users won't care / don't need to know that it was part of a private function that was refactored into a separate thing.

@matthiasdold
Copy link
Contributor Author

matthiasdold commented Apr 8, 2025

2. I'm conflicted about whether it should return an actual annotations object, a dict suitable for passing to `mne.Annotations`, or a dataframe. I guess the Annotations and the DataFrame can be quite easily constructed from the dict, so I guess the dict is most general/flexible.

Operating on a (pandas) dataframe would also lead to a more readable code, IMHO, as we can use standard filtering instead of _drop. Although just a subset of the functionality, the dataframe is what I am using in the meantime for moabb.

I didn't go the dataframe route as I wanted to be minimally invasive. Happy to provide a dataframe version as well.

@sappelhoff
Copy link
Member

Thanks for your view, Dan.

I agree with your points 1, 3 and 4 -- and regarding point 2, I think returning a dict is the way to go. We do not have pandas as a required dependency of mne-bids, and I don't think the current "feature" is enough to warrant it.

@PierreGtch
Copy link
Contributor

@matthiasdold I think you need double backquotes for code samples. This might be why your doc build fails
https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html

@matthiasdold
Copy link
Contributor Author

matthiasdold commented Apr 9, 2025

@matthiasdold I think you need double backquotes for code samples. This might be why your doc build fails https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html

Thx for the suggestion @PierreGtch - unfortunately this seems not to be the issue. As for the :func: part, all other lines only had single backquotes, so I would not expect this to be a porblem there (the error message states that mne_bids.events_file_to_annotations_kwargs cannot be found, however the import works for the pytest... )

I am a bit clueless here...

@PierreGtch
Copy link
Contributor

@matthiasdold your issue is not anymore with building the doc but before, when the ci is trying to pull the code
It is probably unrelated to you

@sappelhoff sappelhoff mentioned this pull request Apr 10, 2025
7 tasks
@sappelhoff
Copy link
Member

yes, something is fishy with circleci. I just opened #1391 with tiny changes to see if I get a circleci issue there as well. If I don't, then maybe the failures ARE related to the PR here, but it looks almost like they aren't.

@drammock
Copy link
Member

something is fishy with circleci.

the error looks totally related to me:
https://app.circleci.com/pipelines/github/mne-tools/mne-bids/6995/workflows/6d22be6e-fce2-43b0-a5d2-a0712a9fd896/jobs/9247?invite=true#step-108-0_142

/home/circleci/project/doc/whats_new.rst:43: WARNING: py:func reference target not found: mne_bids.events_file_to_annotation_kwargs [ref.func]

same error here:
https://github.com/mne-tools/mne-bids/actions/runs/14379592851/job/40319969519?pr=1389#step:6:257

/home/runner/work/mne-bids/mne-bids/doc/whats_new.rst:43: WARNING: py:func reference target not found: mne_bids.events_file_to_annotation_kwargs [ref.func]

@drammock
Copy link
Member

add the new public function name to doc/api.rst

@sappelhoff
Copy link
Member

the error looks totally related to me:

I was talking about this one:

https://app.circleci.com/pipelines/github/mne-tools/mne-bids/6993/workflows/1783535a-8a54-4b94-ac2b-ace4331f6e3f/jobs/9245

image

@sappelhoff
Copy link
Member

yet #1391 is unaffected, so it must be something about this PR.

@sappelhoff
Copy link
Member

ok fine, the errors that circleci now provides all seem meaningful and related :-) no idea what I was after in #1389 (comment)

@matthiasdold this should be possible for you to address. I am +1 for merging once everything is green. thanks a lot!

mne_bids/read.py Outdated
Notes
-----
The function handles the following cases:
- If the `trial_type` column is available, it uses it for event descriptions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add blank line before bullet list starts

@drammock
Copy link
Member

here's the rendered docstring:
https://output.circle-artifacts.com/output/job/1e8c4f31-cedf-4fcd-a9ef-94de47d8437b/artifacts/0/dev/generated/mne_bids.events_file_to_annotation_kwargs.html

see how the notes section bullet list isn't formatted right, and the formatting in the Returns section is inconsistent.

@matthiasdold
Copy link
Contributor Author

Thanks a lot @drammock for the detailed explanation! I should get it sorted out now. My first time working with circleci (as you of course must have guessed), so I was not aware that I could look at the artifacts. Thanks for your patients @drammock and @sappelhoff 😅

Copy link
Member

@drammock drammock left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for the thorough test. Left a few nitpick comments that you can feel free to ignore or implement.

Comment on lines +1520 to +1528
assert (ev_kwargs_filtered["onset"] == dext_f["onset"].astype(float).values).all()
assert (
ev_kwargs_filtered["duration"]
== dext_f["duration"].replace("n/a", "0.0").astype(float).values
).all()
assert (ev_kwargs_filtered["description"] == dext_f["trial_type"].values).all()
assert (
ev_kwargs_filtered["duration"][0] == 0.0
) # now idx=0, as first row is filtered out
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

idem. Could also consider pd.testing.assert_frame_equal but that would require converting the dict to dataframe first.

Copy link
Contributor Author

@matthiasdold matthiasdold Apr 16, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we are using a dict for the return type, I would just stick to comparing the iterables here. Implemented all other suggestions though.

@matthiasdold
Copy link
Contributor Author

Once mne-tools/mne-python#13213 is implemented in mne, we could also add the additional metadata to the Annotations object. This could be part of this PR or a separate one. Let me know what you prefer @drammock and @sappelhoff

Copy link
Member

@sappelhoff sappelhoff left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @matthiasdold!

@sappelhoff sappelhoff merged commit 486597a into mne-tools:main Apr 16, 2025
25 checks passed
Copy link

welcome bot commented Apr 16, 2025

🎉 Congrats on merging your first pull request! 🥳 Looking forward to seeing more from you in the future! 💪

@sappelhoff
Copy link
Member

This could be part of this PR or a separate one.

a separate one, please! :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants