Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
33 changes: 26 additions & 7 deletions eoglearn/datasets/eegeyenet.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
from pathlib import Path
from typing import Literal

import pandas as pd

Expand All @@ -14,23 +15,31 @@ def _get_params(subject, run):
row = df.loc[(df.subject == subject.upper()) & (df.run == int(run))]
assert len(row) == 1
row = row.T.squeeze()
task = row["task"]
return dict(
url=row["url"],
archive_name=f"{subject}_DOTS{run}_EEG.mat",
folder_name=f"EEGEYENET-Data/dots/{subject}",
archive_name=f"{subject}_{task}{run}_EEG.mat",
folder_name=f"EEGEYENET-Data/{task}/{subject}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should it be f"EEGEYENET-Data/{task.lower()}/{subject}" to be consistent with the previous code, or the case doesn't matter here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh thanks for bringing that up. It does matter, but I'm wondering if it is worth the change to keep the case consistent across the API. i.e., since we use "DOTS" and "AS" in the code, maybe we be consistent and name the folder "DOTS / "AS"?

The next time you download the data you need to be aware of this (so that you don't keep a copy in both "dots" and "DOTS" directories). WDYT? worth the change or no?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, fine with me. I was just not sure if that path was use for reading an existing repository (in which case it would crash) or to write up the data (which would be fine). From what you say, it seems to be the later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes good point - I think it is fine since we typically do..

fpath = eoglearn.datasets.fetch_eegeyenet()
raw = eoglearn.io.read_raw_eegeyenet(fpath)

Which should handle the change from "dots" to "DOTS" for us.

There might be somewhere in paper_2024 that is affected (if we hardcoded "EEGEYENET/data/dots" somewhere, but from a quick git grep "dots", it doesn't seem like this is the case.

hash=row["hash"],
dataset_name="EEGEYENET")


def get_subjects_runs():
def get_subjects_runs(task: Literal["DOTS", "AS"] = "DOTS"):
"""Get dictionary of {subject: [lists of runs]}.

Parameters
----------
task :
Which EEGEYENET task task to extract the subject ID's and runs for. Can be
``"DOTS"``, or ``"AS"`` (antisaccade). Defaults to ``'DOTS'``.

Returns
-------
dict
Dictionary of subjects with the runs as values.
"""
df = _get_urls_df()
df = df.loc[df["task"] == task].copy()
return {subject: df.run.values[df.subject == subject]
for subject in df.subject.unique()}

Expand All @@ -54,13 +63,14 @@ def fetch_eegeyenet(subject="EP10", run=1, fetch_dataset_kwargs=None):
pathlib.Path
Path to the downloaded file.
"""
task = _get_task_from_subject_id(subject)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since get_subject_runs API now has a parameter task="DOT". We need to actually pass task="AS" in cases where the filename is an anti-saccade file, e.g. fetch_eegeyenet(subject="BZ2").

I created little helper function for this. For some reason it feels a little hacky, but it works.

So I just want to make sure everyone agrees with this approach.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems fine to me. I think what is not optimal is for the task to be part of the subject ID (you cannot have the same subject doing two tasks), but that was decided by EEGEyeNet, so it is fine to use it I think...

if not fetch_dataset_kwargs:
fetch_dataset_kwargs = dict()
run = int(run)
runs = get_subjects_runs()
runs = get_subjects_runs(task=task)
if subject not in runs or run not in runs[subject]:
raise ValueError("subject and run not available. See "
"get_subjects_runs() for information on "
raise ValueError(f"subject {subject} and run {run} not available. "
"See get_subjects_runs() for information on "
"available subjects and runs.")

fetch_dataset_kwargs["dataset_params"] = _get_params(subject, run)
Expand All @@ -72,5 +82,14 @@ def fetch_eegeyenet(subject="EP10", run=1, fetch_dataset_kwargs=None):
if not fpath.exists():
fetch_dataset_kwargs["force_update"] = True
_fetch_dataset(fetch_dataset_kwargs=fetch_dataset_kwargs)

return fpath


def _get_task_from_subject_id(subject):
if subject.startswith("EP"):
return "DOTS"
if subject.startswith(("A", "B")):
return "AS"
raise ValueError(
f"Can't determine task for {subject}. Is this subject in eegeyenet_urls.csv?"
)
Loading
Loading