-
Notifications
You must be signed in to change notification settings - Fork 3
ENH: Extend dataset fetcher for the Antisaccade task #21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -1,4 +1,5 @@ | ||
| from pathlib import Path | ||
| from typing import Literal | ||
|
|
||
| import pandas as pd | ||
|
|
||
|
|
@@ -14,23 +15,31 @@ def _get_params(subject, run): | |
| row = df.loc[(df.subject == subject.upper()) & (df.run == int(run))] | ||
| assert len(row) == 1 | ||
| row = row.T.squeeze() | ||
| task = row["task"] | ||
| return dict( | ||
| url=row["url"], | ||
| archive_name=f"{subject}_DOTS{run}_EEG.mat", | ||
| folder_name=f"EEGEYENET-Data/dots/{subject}", | ||
| archive_name=f"{subject}_{task}{run}_EEG.mat", | ||
| folder_name=f"EEGEYENET-Data/{task}/{subject}", | ||
| hash=row["hash"], | ||
| dataset_name="EEGEYENET") | ||
|
|
||
|
|
||
| def get_subjects_runs(): | ||
| def get_subjects_runs(task: Literal["DOTS", "AS"] = "DOTS"): | ||
| """Get dictionary of {subject: [lists of runs]}. | ||
|
|
||
| Parameters | ||
| ---------- | ||
| task : | ||
| Which EEGEYENET task task to extract the subject ID's and runs for. Can be | ||
| ``"DOTS"``, or ``"AS"`` (antisaccade). Defaults to ``'DOTS'``. | ||
|
|
||
| Returns | ||
| ------- | ||
| dict | ||
| Dictionary of subjects with the runs as values. | ||
| """ | ||
| df = _get_urls_df() | ||
| df = df.loc[df["task"] == task].copy() | ||
| return {subject: df.run.values[df.subject == subject] | ||
| for subject in df.subject.unique()} | ||
|
|
||
|
|
@@ -54,13 +63,14 @@ def fetch_eegeyenet(subject="EP10", run=1, fetch_dataset_kwargs=None): | |
| pathlib.Path | ||
| Path to the downloaded file. | ||
| """ | ||
| task = _get_task_from_subject_id(subject) | ||
|
Member
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Since I created little helper function for this. For some reason it feels a little hacky, but it works. So I just want to make sure everyone agrees with this approach.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Seems fine to me. I think what is not optimal is for the task to be part of the subject ID (you cannot have the same subject doing two tasks), but that was decided by EEGEyeNet, so it is fine to use it I think... |
||
| if not fetch_dataset_kwargs: | ||
| fetch_dataset_kwargs = dict() | ||
| run = int(run) | ||
| runs = get_subjects_runs() | ||
| runs = get_subjects_runs(task=task) | ||
| if subject not in runs or run not in runs[subject]: | ||
| raise ValueError("subject and run not available. See " | ||
| "get_subjects_runs() for information on " | ||
| raise ValueError(f"subject {subject} and run {run} not available. " | ||
| "See get_subjects_runs() for information on " | ||
| "available subjects and runs.") | ||
|
|
||
| fetch_dataset_kwargs["dataset_params"] = _get_params(subject, run) | ||
|
|
@@ -72,5 +82,14 @@ def fetch_eegeyenet(subject="EP10", run=1, fetch_dataset_kwargs=None): | |
| if not fpath.exists(): | ||
| fetch_dataset_kwargs["force_update"] = True | ||
| _fetch_dataset(fetch_dataset_kwargs=fetch_dataset_kwargs) | ||
|
|
||
| return fpath | ||
|
|
||
|
|
||
| def _get_task_from_subject_id(subject): | ||
| if subject.startswith("EP"): | ||
| return "DOTS" | ||
| if subject.startswith(("A", "B")): | ||
| return "AS" | ||
| raise ValueError( | ||
| f"Can't determine task for {subject}. Is this subject in eegeyenet_urls.csv?" | ||
| ) | ||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should it be
f"EEGEYENET-Data/{task.lower()}/{subject}"to be consistent with the previous code, or the case doesn't matter here?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh thanks for bringing that up. It does matter, but I'm wondering if it is worth the change to keep the case consistent across the API. i.e., since we use "DOTS" and "AS" in the code, maybe we be consistent and name the folder "DOTS / "AS"?
The next time you download the data you need to be aware of this (so that you don't keep a copy in both "dots" and "DOTS" directories). WDYT? worth the change or no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, fine with me. I was just not sure if that path was use for reading an existing repository (in which case it would crash) or to write up the data (which would be fine). From what you say, it seems to be the later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes good point - I think it is fine since we typically do..
Which should handle the change from "dots" to "DOTS" for us.
There might be somewhere in
paper_2024that is affected (if we hardcoded "EEGEYENET/data/dots" somewhere, but from a quickgit grep "dots", it doesn't seem like this is the case.