-
Notifications
You must be signed in to change notification settings - Fork 11
Home
The main aim of the Computational Testing for Automated Preprocessing (CTAP) toolbox is to regularise and streamline EEG preprocessing, to minimise human subjectivity and error and facilitate easy batch processing for experts and novices alike. The main aim breaks down into two separate but complementary aims:
- batch processing using EEGLAB functions and
- testing and comparison of automated methodologies.
The CTAP toolbox provides two main functionalities to achieve these aims:
- the core supports scripted specification of an EEGLAB analysis pipeline and tools for running the pipe, making the workflow transparent and easy to control. Automated output of ‘quality control’ logs and imagery helps to keep track of what's going on.
- the testing module uses synthetic data to generate ground truth controlled tests of preprocessing methods, with capability to generate new synthetic data matching the parameters of the lab’s own data. This allows experimenters to select the best methods for their purpose, or developers to flexibly test and benchmark their novel methods.
Clone the GitHub repository to your machine using
git clone https://github.com/bwrc/ctap.git <dest dir>
Add the whole directory to your Matlab path. You also need to have EEGLAB added to your Matlab path.
A minimalistic working example can be found in
<dest dir>/ctap/templates/minimalistic_example/
Copy the cfg_minimal.m and pipebatch_minimal.m files and use them as a starting point for your own pipe.
More examples are available under <dest dir>/ctap/templates/.
The easiest way to use CTAP is by using the two template files provided: cfg_gettingstarted.m and pipebatch_gettingstarted.m. The cfg_gettingstarted.m file shows how to properly create the Cfg struct needed by CTAP. The pipebatch_gettingstarted.m shows how to specify an analysis pipe and how to execute it. For this simplistic case the measurement configuration file is automatically generated based on a directory with EEG files.
CTAP: this software package / repository
measurement configuration or MC: measurement configuration file contains a list of all test subjects, measurements and other relevant information needed to find the data files and to analyze them properly. This file can be automatically generated based on a directory with EEG data files or it can be custom made by the experienced user.
Configuration struct Cfg: A Matlab struct that contains all configurations that alter the behavior of CTAP, e.g. parameters to be passed to functions, path names to use for output, etc.
-
raw data
-
processed (intermediate) data
-
extracted features
-
generic analysis code (preprocessing, feature extraction, statistical analysis)
-
project specific scripts & setup files
These components should be kept apart (different folders / repositories) in order to allow:
-
easy removal of intermediate analysis steps to free up disk space
-
easy copying/synching of a partial project for e.g. working at home
-
code developed in project A to directly benefit project B
-
structured data storage during collection
-
data import into EEG struct: signal selection, event import
-
detection of bad channels
-
detection of bad signal segments/epochs
-
ICA
-
artefact removal using ICs
-
feature extraction
-
feature export/storage
-
statistical analysis
-
modular design: analysis steps should be implemented as standalone functions (no pop-ups or GUI elements), they should be applicable in any (reasonable) order
-
ease of scripting: unified scripting interface for the analysis functions (e.g. input: EEG struct + varargin, output: EEG struct + values of varargin used + other important parameters)
-
no GUI (at least for now)
-
three forms of data: continuous, discontinuous, epoched. (Discontinuous data emerges as segments of data are discarded. Epoched data is formed when chopping the data into pieces for e.g. ERP or feature extraction.) Proposed solution: state data type requirement in the documentation of each analysis function and check data adequacy within code as well
-
some analysis steps will take hours to compute (e.g. ICA for a 128 chan dataset of 30 min duration). Proposed solution: intermediate saving of results to minimize unnecessary re-computation. How to make these saves automated, flexible and not too disk-consuming?
-
configuration depends on project, task/protocol, subject and feature to be extracted: how to create necessary configurations without excessive manual work? Proposed solution: Define a base template for the configuration of the project. Create task/protocol and feature specific configuration files by updating the base template. Subject specific updates loaded from a main information file/database. Some tools needed to make the comparison of configurations easy. Analysis functions should define the defaults and report the actual values used as output.
-
the components of the pipeline will be changing constantly
-
branching the analysis: how to avoid mixing datasets computed using different configurations?
Analysis steps will be collected to sets/"chunks" of one or more individual steps. Each step set has a name (id string). The scripting system is configured by declaring:
-
steps belonging to each "step set" (including order) and name of the step set
-
order of the "step sets" in the whole workflow
-
raw data location
-
any non-default varargins for the analysis functions
Each step set produces an intermediate processed EEG dataset. An example: one step set might be called "prepro" and it could contain {bad_channel_rejection, bad_segment_rejection}.
The user specifies which measurements to process (giving a list of casenames) and which step sets to run. The system then looks from the configuration where to look for the source data.
Why like this:
-
saving intermediate data after every analysis step is often unnecessary and replicates the data too much (at least for high density EEG)
-
running the whole analysis once is impractical: takes very long, does not allow efficient debugging of problematic measurements
-
setting the analysis workflow by manipulating input and output file locations separately for each step can be very frustrating (depending on the implementation)
-
the analysis workflow usually changes constantly as new things come up so reconfiguring should be easy