Skip to content

Conversation

@scott-huberty
Copy link
Member

@scott-huberty scott-huberty commented Jan 10, 2025

This PR extends our dataset fetcher to download the EEG files for other task collected by EEGEYENET, the anti-saccade task.

This task contains some 300 additional EEG-ET recordings.

if acceptable, I'll create a reader for the anti saccade data in a separate PR

EDIT: The list of files are extracted from here

This dataset is organized slightly different than the dots task. I can't tell yet if there are 300 subjects with 1 run each... or perhaps ~50 subjects with about 6 runs each. I've defaulted to the former, but if it turns out that the latter is true I will update this csv file.

@scott-huberty scott-huberty changed the title Antisaccade ENH: Extend dataset fetcher for the Antisaccade task Jan 10, 2025
@christian-oreilly
Copy link
Member

Looks good to me. Thanks, @scott-huberty, for adding this. We should probably start thinking about upping our game in terms of continuous integration. Maybe adding a test that runs this function would be a first step? If you'd rather not add this to this PR, I'll create a separate issue about adding CI to this project and some initial testing.

@scott-huberty
Copy link
Member Author

Good idea @christian-oreilly - bc actually 888ffb8 would have added a regression...

I'm re-requesting a review based on the new code I've added since 05df081

pathlib.Path
Path to the downloaded file.
"""
task = _get_task_from_subject_id(subject)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since get_subject_runs API now has a parameter task="DOT". We need to actually pass task="AS" in cases where the filename is an anti-saccade file, e.g. fetch_eegeyenet(subject="BZ2").

I created little helper function for this. For some reason it feels a little hacky, but it works.

So I just want to make sure everyone agrees with this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. I think what is not optimal is for the task to be part of the subject ID (you cannot have the same subject doing two tasks), but that was decided by EEGEyeNet, so it is fine to use it I think...

archive_name=f"{subject}_DOTS{run}_EEG.mat",
folder_name=f"EEGEYENET-Data/dots/{subject}",
archive_name=f"{subject}_{task}{run}_EEG.mat",
folder_name=f"EEGEYENET-Data/{task}/{subject}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be f"EEGEYENET-Data/{task.lower()}/{subject}" to be consistent with the previous code, or the case doesn't matter here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks for bringing that up. It does matter, but I'm wondering if it is worth the change to keep the case consistent across the API. i.e., since we use "DOTS" and "AS" in the code, maybe we be consistent and name the folder "DOTS / "AS"?

The next time you download the data you need to be aware of this (so that you don't keep a copy in both "dots" and "DOTS" directories). WDYT? worth the change or no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fine with me. I was just not sure if that path was use for reading an existing repository (in which case it would crash) or to write up the data (which would be fine). From what you say, it seems to be the later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point - I think it is fine since we typically do..

fpath = eoglearn.datasets.fetch_eegeyenet()
raw = eoglearn.io.read_raw_eegeyenet(fpath)

Which should handle the change from "dots" to "DOTS" for us.

There might be somewhere in paper_2024 that is affected (if we hardcoded "EEGEYENET/data/dots" somewhere, but from a quick git grep "dots", it doesn't seem like this is the case.

pathlib.Path
Path to the downloaded file.
"""
task = _get_task_from_subject_id(subject)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. I think what is not optimal is for the task to be part of the subject ID (you cannot have the same subject doing two tasks), but that was decided by EEGEyeNet, so it is fine to use it I think...

@scott-huberty scott-huberty merged commit 03ec4ac into lina-usc:main Jan 18, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants