Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
cffc2e5
added testing section
ReboreExplore Sep 15, 2025
9a8d992
added additional faq
ReboreExplore Sep 15, 2025
9201d4d
added print in run_all_tests
ReboreExplore Sep 15, 2025
bad3610
added log of stimulus computer status
ReboreExplore Sep 15, 2025
1e4d625
minor changes in README and delete about.md
ReboreExplore Sep 17, 2025
28d7103
minor changes in doc due to changing project config file
ReboreExplore Sep 17, 2025
a91288d
minor adjustments in doc due to changed project config
ReboreExplore Sep 17, 2025
80f9436
refactored code to include new configs
ReboreExplore Sep 17, 2025
ce8d1f5
added testing section
ReboreExplore Sep 15, 2025
ea183ca
added additional faq
ReboreExplore Sep 15, 2025
e5fe9d9
added print in run_all_tests
ReboreExplore Sep 15, 2025
e00f8c7
added log of stimulus computer status
ReboreExplore Sep 15, 2025
2da3e45
Merge branch 'main' into config-files-readjust
ReboreExplore Sep 17, 2025
df340a8
fixed typos with config files refactoring
ReboreExplore Sep 17, 2025
b2f6057
fixed minor typos
ReboreExplore Sep 17, 2025
9503a7a
fixed typos
ReboreExplore Sep 17, 2025
62e4264
changed global_variables - project_stim_root -> project_other_root
ReboreExplore Sep 20, 2025
cf1bf98
recfactored stimulus to other
ReboreExplore Sep 20, 2025
2615cb7
emptied the pid field in the project toml file
ReboreExplore Sep 20, 2025
5389547
changed BidsConfig and FileSelection fields in the project toml file
ReboreExplore Sep 20, 2025
0cc634b
fixed minor mistakes
ReboreExplore Sep 20, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 4 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,9 @@ This package automates the conversion of EEG recordings (xdf files) to BIDS (Bra
git clone https://github.com/s-ccs/LSLAutoBIDS.git
```
### **Step 2: Install the package**
Go to the cloned directory and install the package using pip.
```
cd LSLAutoBIDS
pip3 install lslautobids
```
It is advised to install the package in a separate environment (e.g. using `conda` or `virtualenv`).
Expand All @@ -39,13 +41,13 @@ The package requires the recorded XDF data to be organized in a specific directo


- The `projects` root location is the root directory where all the eeg raw recordings (say `.xdf` files) are stored e.g. `projects/sub-A/ses-001/eeg/sub-A_ses-001_task-foo.xdf`.
- The (optional) `project_stimulus` root location is the directory where the experiments (e.g `.py`, `.oxexp`) and behavioral files (e.g. eye-tracking recordings, labnotebook, participant forms, etc ) are stored.
- The (optional) `project_other` root location is the directory where the experiments (e.g `.py`, `.oxexp`) and behavioral files (e.g. eye-tracking recordings, labnotebook, participant forms, etc ) are stored.
- The `bids` root location is the directory where the converted BIDS data is stored, along with source data and code files which we want to version control using `Datalad`.

> [!IMPORTANT]
> Please follow the BIDS data organization guidelines for storing the neuroimaging data for running this package. The BIDS conversion guidelines are based on the recommended directory/files structure. You only can change the location of the root directories according to your preference. You must also strictly follow the naming convention for the project and subject subdirectories.

Here you will find the recommended directory structure for storing the project data (recorded, stimulus and converted data) in the [data_organization](docs/data_organization.md) file.
Here you will find the recommended directory structure for storing the project data (recorded, other and converted data) in the [data_organization](docs/data_organization.md) file.


### **Step 4: Generate the configuration files**
Expand Down
3 changes: 0 additions & 3 deletions docs/about.md

This file was deleted.

16 changes: 8 additions & 8 deletions docs/data_organization.md
Original file line number Diff line number Diff line change
@@ -1,22 +1,22 @@
# How the data is organized

In this project, we are using a sample xdf file along with the corresponding stimulus files to demonstrate how the data inside the `projectname` folder is organized. This data should be organized in a specific way:
In this project, we are using a sample xdf file along with the corresponding other files to demonstrate how the data inside the `projectname` folder is organized. This data should be organized in a specific way:

### Recommended Project Organization Structure

For convenience, we have provided a recommended project organization structure for the root directories to organize the data better.


> [!IMPORTANT]
> The recommended directory structure is not self generated. The user needs to create the directories and store the recorded and stimulus data in them before running the conversion.
> The recommended directory structure is not self generated. The user needs to create the directories and store the recorded and others data in them before running the conversion.

The dataset (both recorded and converted) is stored in the parent `data` directory. The `data` directory has three subdirectories under which the entire project is stored. The recommended directory structure is as follows:
```
data
├── bids # Converted BIDS data
├── projectname1
├── projectname2
├── project_stimulus # Experimental/Behavioral files
├── project_other # Experimental/Behavioral files
├── projectname1
├── projectname2
├── projects
Expand All @@ -26,7 +26,7 @@ data

```

Here `./data/projects/`, `./data/project_stimulus/`, `./data/bids/` are the root project directories. Each of this root directories will have a project name directory inside it and each project directory will have a subdirectory for each subject.
Here `./data/projects/`, `./data/project_other/`, `./data/bids/` are the root project directories. Each of this root directories will have a project name directory inside it and each project directory will have a subdirectory for each subject.


## Projects Folder
Expand All @@ -52,7 +52,7 @@ Filename Convention for the raw data files :
- **tasklabel** - `duration, mscoco, ...`
- **runlabel** - `001, 002, 003, ...` (need to be an integer)

## Project Stimulus Folder
## Project Other Folder

This folder contains the experimental and behavioral files which we also store in the dataverse. The folder structure is should as follows:

Expand All @@ -66,15 +66,15 @@ This folder contains the experimental and behavioral files which we also store i
└── behavioral_files((lab notebook, CSV, EDF file, etc))

- **projectname** - any descriptive name for the project
- **experiment** - contains the experimental files for the project. Eg: showStimulus.m, showStimulus.py
- **experiment** - contains the experimental files for the project. Eg: showOther.m, showOther.py
- **data** - contains the behavioral files for the corresponding subject. Eg: experimentalParameters.csv, eyetrackingdata.edf, results.tsv.


You can get the filename convention for the data files [here](https://bids-standard.github.io/bids-starter-kit/folders_and_files/files.html#modalities).

## BIDS Folder

This folder contains the converted BIDS data files and other files we want to version control using `Datalad`. Since we are storing the entire dataset in the dataverse, we also store the raw xdf files and the associated stimulus/behavioral files in the dataverse. The folder structure is as follows:
This folder contains the converted BIDS data files and other files we want to version control using `Datalad`. Since we are storing the entire dataset in the dataverse, we also store the raw xdf files and the associated other/behavioral files in the dataverse. The folder structure is as follows:
```
└── bids
└──projectname/
Expand All @@ -90,7 +90,7 @@ This folder contains the converted BIDS data files and other files we want to ve
├── sub-001_ses-001_task-Duration_run-001_eeg.eeg
.........
└── beh
└──behavioral files
└──behavioral files (other files)
└── misc
└── experimental files (This needs to stored in zip format)
└── sourcedata
Expand Down
51 changes: 39 additions & 12 deletions docs/developers_documentation.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

LSLAutoBIDS is a Python tool series designed to automate the following tasks sequentially:
- Convert recorded XDF files to BIDS format
- Integrate the EEG data with non-EEG data (e.g., behavioral, stimulus) for the complete dataset
- Integrate the EEG data with non-EEG data (e.g., behavioral, other) for the complete dataset
- Datalad integration for version control for the integrated dataset
- Upload the dataset to Dataverse
- Provide a command-line interface for cloning, configuring, and running the conversion process
Expand All @@ -17,7 +17,7 @@ LSLAutoBIDS is a Python tool series designed to automate the following tasks seq
- DataLad integration for version control
- Dataverse integration for data sharing
- Configurable project management
- Support for stimulus and behavioral data in addition to EEG data
- Support for behavioral data (non eeg files) in addition to EEG data
- Comprehensive logging and validation for BIDS compliance


Expand Down Expand Up @@ -55,6 +55,9 @@ LSLAutoBIDS is a Python tool series designed to automate the following tasks seq
- [2. Logging Configuration (`config_logger.py`)](#2-logging-configuration-config_loggerpy)
- [3. Utility Functions (`utils.py`)](#3-utility-functions-utilspy)

- [Testing](#testing)
- [Running Tests](#running-tests)


## Architecture - TODO

Expand Down Expand Up @@ -84,7 +87,7 @@ The configuration system manages dataversse and project-specific settings using
#### 1. Dataverse and Project Root Configuration (`gen_dv_config.py`)

This module generates a global configuration file for Dataverse and project root directories. This is a one-time setup per system. This file is stored in `~/.config/lslautobids/autobids_config.yaml` and contains:
- Paths for BIDS, projects, and stimulus directories : This allows users to specify where their eeg data, stimulus data, and converted BIDS data are stored on their system. This paths should be relative to the home/users directory of your system and string format.
- Paths for BIDS, projects, and project_other directories : This allows users to specify where their eeg data, behavioral data, and converted BIDS data are stored on their system. This paths should be relative to the home/users directory of your system and string format.

- Dataverse connection details: Base URL, API key, and parent dataverse name for uploading datasets. Base URL is the URL of the dataverse server (e.g. https://darus.uni-stuttgart.de), API key is your personal API token for authentication (found in your dataverse account settings), and parent dataverse name is the name of the dataverse under which datasets will be created (this can be found in the URL when you are in the dataverses page just after 'dataverse/'). For example, if the URL is `https://darus.uni-stuttgart.de/dataverse/simtech_pn7_computational_cognitive_science`, then the parent dataverse name is `simtech_pn7_computational_cognitive_science`.

Expand Down Expand Up @@ -189,15 +192,15 @@ The pipeline is designed to ensure:

2. EEG recordings are converted to BIDS format using MNE and validated against the BIDS standard.

3. Behavioral and experimental metadata (also called stimulus files in general) are included and checked against project expectations.
3. Behavioral and experimental metadata (also called other files in general in context on this project) are included and checked against project expectations.

4. Project metadata is populated (dataset_description.json). This is required as a part of BIDS standard.

5. The dataset is registered in Dataverse and optionally pushed/uploaded automatically.

#### 1. Entry Point (`bids_process_and_upload()`)

- Reads project configuration (<project_name>_config.toml) to check if a stimulus computer was used. (stimulusComputerUsed: true)
- Reads project configuration (<project_name>_config.toml) to check if a other computer (non eeg files) was used. (otherFilesUsed: true)

- Iterates over each processed file and extracts identifiers. For example, for a file named `sub-001_ses-001_task-Default_run-001_eeg.xdf`, it extracts:

Expand Down Expand Up @@ -246,7 +249,7 @@ This function handles the core conversion of a XDF files to BIDS format and cons

- Load `.xdf` with `create_raw_xdf()`. (See section).

- Apply anonymization (daysback_min + anonymization_number from project TOML config).
- Apply anonymization (daysback_min + anonymizationNumber from project TOML config).

- Write EEG data into BIDS folder via `write_raw_bids().`

Expand All @@ -261,7 +264,7 @@ This function handles the core conversion of a XDF files to BIDS format and cons
- 0: BIDS Conversion done but validation failure

#### 3. Copy Source Files (`copy_source_files_to_bids()`)
This function ensures that the original source files (EEG and stimulus/behavioral files) are also a part our dataset. These files can't be directly converted to BIDS format but we give the user the option to include them in the BIDS directory structure in a pseudo-BIDS format for completeness.
This function ensures that the original source files (EEG and other/behavioral files) are also a part our dataset. These files can't be directly converted to BIDS format but we give the user the option to include them in the BIDS directory structure in a pseudo-BIDS format for completeness.

- Copies the .xdf into the following structure:
`<BIDS_ROOT>/sourcedata/sub-XXX/ses-YYY/sub-XXX_ses-YYY_task-Name_run-ZZZ_eeg.xdf`
Expand All @@ -270,13 +273,13 @@ This function ensures that the original source files (EEG and stimulus/behaviora

- If a file already exists, logs a message and skips copying.

If stimulusComputerUsed=True in project config file:
If otherFilesUsed=True in project config file:

1. Behavioral files are copied via `_copy_behavioral_files()`.

- Validates required files against TOML config (`ExpectedStimulusFiles`). In this config we add the the extensions of the expected stimulus files. For example, in our testproject we use EyeList 1000 Plus eye tracker which generates .edf and .csv files. So we add these extensions as required stimulus files. We also have mandatory labnotebook and participant info files in .tsv format.
- Validates required files against TOML config (`OtherFilesInfo`). In this config we add the the extensions of the expected other files. For example, in our testproject we use EyeList 1000 Plus eye tracker which generates .edf and .csv files. So we add these extensions as required other files. We also have mandatory labnotebook and participant info files in .tsv format.
- Renames files to include sub-XXX_ses-YYY_ prefix if missing.
- Deletes the other files in the stimulus directory that are not listed in `ExpectedStimulusFiles` in the project config file. It doesn"t delete from the source directory, only from out BIDS dataset.
- Deletes the other files in the project_other directory that are not listed in `OtherFilesInfo` in the project config file. It doesn"t delete from the source directory, only from out BIDS dataset.

2. Experimental files are copied via `_copy_experiment_files().`

Expand All @@ -285,7 +288,7 @@ If stimulusComputerUsed=True in project config file:
- Compresses into experiment.tar.gz.
- Removes the uncompressed folder.

There is a flag in the `lslautobids run` command called `--redo_stim_pc` which when specified, forces overwriting of existing stimulus and experiment files in the BIDS dataset. This is useful if there are updates or corrections to the stimulus/behavioral data that need to be reflected in the BIDS dataset.
There is a flag in the `lslautobids run` command called `--redo_other_pc` which when specified, forces overwriting of existing other and experiment files in the BIDS dataset. This is useful if there are updates or corrections to the other/behavioral data that need to be reflected in the BIDS dataset.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

potentially rename this to redo_other_folder? Doesnt need to be on another PC

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure!


#### 4. Create Raw XDF (`create_raw_xdf()`)
This function reads the XDF file and creates an MNE Raw object. It performs the following steps:
Expand Down Expand Up @@ -364,7 +367,7 @@ This module handles the creation of a new dataset in Dataverse using the `pyData
#### 2. Linking DataLad to Dataverse (`link_datalad_dataverse.py`)
This module links the local DataLad dataset to the remote Dataverse dataset as a sibling. The function performs the following steps:
1. It first checks if the Dataverse is already created in the previous runs or it is just created in the current run (flag==0). If flag==0, it proceeds to link the DataLad dataset to Dataverse.
2. It runs the command `datalad add-sibling-dataverse dataverse_base_url doi_id`. This command adds the Dataverse as a sibling to the local DataLad dataset, allowing for synchronization and data management between the two. For lslautobids, we currently only allow to deposit data to Dataverse. In future version, we shall also add user controlled options for adding other siblings like github, gitlab, etc.
2. It runs the command `datalad add-sibling-dataverse dataverse_base_url doi_id`. This command adds the Dataverse as a sibling to the local DataLad dataset, allowing for synchronization and data management between the two. For lslautobids, we currently only allow to deposit data to Dataverse. In future version, we shall also add user controlled options for adding other siblings like github, gitlab, OpenNeuro, AWS etc.

We chose Dataverse as it serves as both a repository and a data sharing platform, making it suitable for our needs. It also integrates well with DataLad and allows sharing datasets with collaborators or the public.

Expand Down Expand Up @@ -402,3 +405,27 @@ This module contains various utility functions used across the application.
3. `write_toml_file` : Writes a dictionary to a TOML file.


## Testing

The testing framework uses `pytest` to validate the functionality of the core components.

- The tests are located in the `tests/` directory and cover various modules including configuration generation, file processing, BIDS conversion, DataLad integration, and Dataverse interaction. (Work in progress)

- The test directory contains :
- `test_utils` : Directory containing utility functions needed across multiple test files.
- `testcases` : Directory containing all the tests in a in a directory structure - `test_<test_name>`.
- Each `test_<test_name>` directory contains a `data` folder with sample data for that test and a `test_<test_name>.py` file with the actual test cases.
- `run_all_tests.py` : A script to run all the tests in the `testcases` directory sequentially.

Tests will be added continuously as new features are added and existing features are updated.

### Running Tests

To run the tests, navigate to the `tests/` directory and execute:
`python tests/run_all_tests.py`

These tests ensure that each component functions as expected and that the overall pipeline works seamlessly. This tests will also be triggered automatically on each push or PR to the main repository using GitHub Actions.

## Miscellianeous Points
- To the current date, only EEG data is supported for BIDS conversion. Support for other modalities like Eye-tracking, etc,. in the BIDS format is not yet supported. Hence, LSLAutoBIDS relies on semi-BIDS data structures for those data and use user-definable regular expressions to match expected data-files. A future planned feature is to provide users more flexibility, especially in naming / sorting non-standard files. Currently, the user can only specify the expected file extensions for other/behavioral data and is automatically renamed to include sub-XXX_ses-YYY_ prefix if missing and also copied to pseudo-BIDS folder structure like `<BIDS_ROOT>/sourcedata/sub-XXX/ses-YYY/`, `<BIDS_ROOT>/misc/experiment.tar.gz` etc,.

Loading