Home

Computational Testing for Automated Preprocessing (CTAP)

WIKI is in progress - delete me when done.

What is CTAP?

The main aim of the Computational Testing for Automated Preprocessing (CTAP) toolbox is to regularise and streamline EEG preprocessing, to minimise human subjectivity and error and facilitate easy batch processing for experts and novices alike. The main aim breaks down into two separate but complementary aims:

batch processing using EEGLAB functions and
testing and comparison of automated methodologies.

The CTAP toolbox provides two main functionalities to achieve these aims:

the core supports scripted specification of an EEGLAB analysis pipeline and tools for running the pipe, making the workflow transparent and easy to control. Automated output of ‘quality control’ logs and imagery helps to keep track of what's going on.
the testing module uses synthetic data to generate ground truth controlled tests of preprocessing methods, with capability to generate new synthetic data matching the parameters of the lab’s own data. This allows experimenters to select the best methods for their purpose, or developers to flexibly test and benchmark their novel methods.

Installation

Clone the GitHub repository to your machine using

git clone https://github.com/bwrc/ctap.git <dest dir>

Add the whole directory to your Matlab path. You also need to have EEGLAB added to your Matlab path.

Getting started

A minimalistic working example can be found in

<dest dir>/ctap/templates/minimalistic_example/

Copy the cfg_minimal.m and pipebatch_minimal.m files and use them as a starting point for your own pipe.

More examples are available under <dest dir>/ctap/templates/.

The easiest way to use CTAP is by using the two template files provided: cfg_gettingstarted.m and pipebatch_gettingstarted.m. The cfg_gettingstarted.m file shows how to properly create the Cfg struct needed by CTAP. The pipebatch_gettingstarted.m shows how to specify an analysis pipe and how to execute it. For this simplistic case the measurement configuration file is automatically generated based on a directory with EEG files.

Terminology

CTAP: this software package / repository

measurement configuration or MC: measurement configuration file contains a list of all test subjects, measurements and other relevant information needed to find the data files and to analyze them properly. This file can be automatically generated based on a directory with EEG data files or it can be custom made by the experienced user.

Configuration struct Cfg: A Matlab struct that contains all configurations that alter the behavior of CTAP, e.g. parameters to be passed to functions, path names to use for output, etc.

Philosophy

Components of a typical analysis system

raw data
processed (intermediate) data
extracted features
generic analysis code (preprocessing, feature extraction, statistical analysis)
project specific scripts & setup files

These components should be kept apart (different folders / repositories) in order to allow:

easy removal of intermediate analysis steps to free up disk space
easy copying/synching of a partial project for e.g. working at home
code developed in project A to directly benefit project B

Example workflow for high-density EEG

structured data storage during collection
data import into EEG struct: signal selection, event import
detection of bad channels
detection of bad signal segments/epochs
ICA
artefact removal using ICs
feature extraction
feature export/storage
statistical analysis

Design principles

modular design: analysis steps should be implemented as standalone functions (no pop-ups or GUI elements), they should be applicable in any (reasonable) order
ease of scripting: unified scripting interface for the analysis functions (e.g. input: EEG struct + varargin, output: EEG struct + values of varargin used + other important parameters)
no GUI (at least for now)

Main challenges

three forms of data: continuous, discontinuous, epoched. (Discontinuous data emerges as segments of data are discarded. Epoched data is formed when chopping the data into pieces for e.g. ERP or feature extraction.) Proposed solution: state data type requirement in the documentation of each analysis function and check data adequacy within code as well
some analysis steps will take hours to compute (e.g. ICA for a 128 chan dataset of 30 min duration). Proposed solution: intermediate saving of results to minimize unnecessary re-computation. How to make these saves automated, flexible and not too disk-consuming?
configuration depends on project, task/protocol, subject and feature to be extracted: how to create necessary configurations without excessive manual work? Proposed solution: Define a base template for the configuration of the project. Create task/protocol and feature specific configuration files by updating the base template. Subject specific updates loaded from a main information file/database. Some tools needed to make the comparison of configurations easy. Analysis functions should define the defaults and report the actual values used as output.
the components of the pipeline will be changing constantly
branching the analysis: how to avoid mixing datasets computed using different configurations?

One possible pipeline design could be:

Analysis steps will be collected to sets/"chunks" of one or more individual steps. Each step set has a name (id string). The scripting system is configured by declaring:

steps belonging to each "step set" (including order) and name of the step set
order of the "step sets" in the whole workflow
raw data location
any non-default varargins for the analysis functions

Each step set produces an intermediate processed EEG dataset. An example: one step set might be called "prepro" and it could contain {bad_channel_rejection, bad_segment_rejection}.

The user specifies which measurements to process (giving a list of casenames) and which step sets to run. The system then looks from the configuration where to look for the source data.

Why like this:

saving intermediate data after every analysis step is often unnecessary and replicates the data too much (at least for high density EEG)
running the whole analysis once is impractical: takes very long, does not allow efficient debugging of problematic measurements
setting the analysis workflow by manipulating input and output file locations separately for each step can be very frustrating (depending on the implementation)
the analysis workflow usually changes constantly as new things come up so reconfiguring should be easy

Basics

Configuration

Formats

Implementation details and extending

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Home

Computational Testing for Automated Preprocessing (CTAP)

WIKI is in progress - delete me when done.

What is CTAP?

Installation

Getting started

Terminology

Philosophy

Components of a typical analysis system

Example workflow for high-density EEG

Design principles

Main challenges

One possible pipeline design could be:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Basics

Configuration

Formats

Implementation details and extending

Clone this wiki locally