-
Notifications
You must be signed in to change notification settings - Fork 11
CTAP system overview
The CTAP pipeline specification specifies which analysis steps are performed in which order. An example of an analysis step is e.g. filtering or bad channel detection. Analysis steps can further be grouped into sets of steps referred to as step sets. An intermediate save is done after each step set, providing a possibility run the whole pipe in smaller chunks. Below is a sample definition of a very small analysis pipe with two step sets:
i = 1; %stepSet 1
stepSet(i).funH = { @CTAP_load_data,...
@CTAP_load_chanlocs,...
@CTAP_tidy_chanlocs,...
@CTAP_reref_data,...
@CTAP_blink2event};
stepSet(i).id = [num2str(i) '_load_WCST'];
stepSet(i).srcID = '';
i = i+1; %stepSet 2
stepSet(i).funH = { @CTAP_filter_data};
stepSet(i).id = [num2str(i) '_filter'];
stepSet(i).srcID = '';
It can be seen that the pipeline is specified as a struct array where each element corresponds to one step set. Each such element contains analysis steps as a cell array of function handles in field .funH. Field .id is used as a unique identifier for each step set e.g. when storing the intermediate results on disk. Field .srcID defines which step set is used as the source data for the step set at hand. Passing an empty string calls the default behaviour, which is to use the preceding step set in the struct array. This feature can be used to temporarily bypass the predefined step set order e.g. for debugging purposes.
Each step set saves the processed EEG by default. However for expert users, field .save can be set to false, e.g. to isolate some processing steps which do not change the underlying data but only calculate something. In subsequent steps, the pipeline looper will then try to load data from the most recent step set for which .save is true. Therefore .save must be true for some step before the pipe can make any progress in preprocessing. Thus it is not recommended to use this functionality unless strongly warranted.
The specified pipe must also be given a parameter configuration, for the whole pipe, and for each analysis function. Default parameters are provided for most cases, but it is optimal to fine tune the behaviour by selecting own parameters. Parameters are passed to functions from a struct. It is usually practical to store the struct in a separate m-file. A typical minimal contents of this file might be:
function [Cfg, FP] = cfg_minimal(dataRoot, branchID)
%% Analysis output (data, quality control) storage location
Cfg.env.paths.analysisRoot = fullfile(dataRoot,'ctap',branchID);
%% Channel specifications
Cfg.eeg.chanlocs = fullfile(dataRoot,'channel_locations_acticap_32.ced');
Cfg.eeg.reference = {'TP10' 'TP9'}; % EEG reference channels to use
%% Configure analysis functions
% Load data
FP.load_data = struct('type', 'neurone');
% Amplitude thresholding from continuous data (bad segments)
FP.detect_bad_segments = struct('amplitudeTh', [-100, 100]); %in muV
The file contains a function that takes in a path (dataRoot) and an identification string (branchID) and outputs general configurations (Cfg) as well as analysis function parameters (FP). The input dataRoot specifies a path where the all output of CTAP should be stored. The branchID string is used to separate analysis branches from each other. The Cfg. fields specified above are obligatory and should always be specified for CTAP to work. The function parameter struct FP contains function names as fields and parameter specifications below each function name. Matching FP field names to CTAP functions are made using the function name i.e. each field FP.<function name> should have a counterpart CTAP\_<function name>() in the pipe specified. Tunable parameters are the variable input arguments of the CTAP\_*() function (see documentation in the functions themselves).
For expert users, n multiple parameters can be specified for a single function, which is then called n times in the pipe. For example, CTAP_detect_bad_comps() might be required twice, once for blink detection and once for abnormal spectrum of the component. Passing a cell array creates the required multi-row struct, thus the relevant parameter specification is:
% Detect bad ICA components
out.detect_bad_comps = struct('method', {'blink_template' 'abnormal_spectra'});
If additional parameters are required for each method, they may specified in a similar way, even if they do not apply to each method: the extra parameter is simply ignored during execution of the method it does not relate to. For example (here cmpSpcMethod specifies the method to compute the spectrum for 'abnormal_spectra', and so should be in the same cell array order as the method it goes to):
% Detect bad ICA components
out.detect_bad_comps = struct(...
'method', {'blink_template' 'abnormal_spectra'},...
'cmpSpcMethod', {'' 'fft'});
Data is input to the pipe as a measurement configuration struct. This meta-data may be read from a fixed format spreadsheet file using:
read_measinfo_spreadsheet([path_to_spreadsheet])
Or it may be autogenerated from a directory of data files using
read_measinfo_fromfiles([path_to_spreadsheet], [various], [filter], [arguments])
The former approach allows careful and extensive specification of the meta-data for each data file. The latter approach is faster. The measurement configuration struct is fully documented elsewhere in the wiki.
The configuration struct and the parameter struct are checked, finalised and integrated by cfg_ctap_functions().
This function adds and makes various required sub-directories.
cfg_ctap_functions() checks that required fields are present, e.g. the reference channel location, and the stepSets sub-fields, e.g. .id, .srcID. It will discard the step set with .id = 'test', if it is present but not requested. Thus the user can have a handy 'test' step set in his pipe without commenting and uncommenting it every time.
cfg_ctap_functions() adds the specified parameters to the Cfg.ctap field, creating fields for every stepSet function, including empty fields if no parameter was set. It checks the validity of parameter assignments, including where a function was called multiple times, ensuring that enough parameter 'rows' exist for the number of function calls.
If only a single parameter row is specified but there are multiple calls to the function, then the same parameter is applied everywhere. A problem will only arise when there are n multiple parameters, but there are not n calls to the function. In that case, no default assignment can be assumed and the pipe will not execute.
CTAP_pipeline_looper() performs the main work of executing the pipeline functions on data and parameters defined.
CTAP_pipeline_looper() runs some checks to ensure that file loading locations are well-specified in the case that the user has requested the latter part of a pipe, e.g. steps 2+ from a pipe.
CTAP_pipeline_looper() is also set to skip steps for which outputs already exist - i.e. if a pipe completed half the steps for a file and was then interrupted, the next run of CTAP_pipeline_looper() will not overwrite these previously saved files. Thus the user should explicitly set the parameter overwrite to true, if she wishes the steps to be run again.
CTAP_pipeline_looper() extracts the parameters relevant to the next function to be called. If there are multiple rows of parameters for the same function name, e.g. CTAP_myfunc(), CTAP_pipeline_looper() pops the top of this stack so that every time CTAP_myfunc() is called it gets a new set of parameters (at least insofar as the specified parameters are different).
Every function in the pipe creates a record of its execution and parameters in the struct EEG.CTAP.history. Only functions following the template for CTAP_*() will do this independently. Those provided at the time of writing will collate the exact set of parameters used in the function call, including defaults which are only known at the ctapeeg_*() level.
If a function is called from the pipe without creating a history entry, CTAP_pipeline_looper() creates the entry itself, using the stepset id, function name, and extracted parameter arguments. This is a non-preferred way to create history entries because extracted parameter arguments miss any default parameters that might be specified lower in the stack.
If the step set fails, the entire file will fail at that point, and the pipe will load the next file and proceed. The failed file, as it stands at the failed step set, can be saved to disk if the CTAP_pipeline_looper() parameter trackfail is set to true. Such files are automatically deleted if the pipe is run successfully at a later time.
A directory is always created for each step set, even if no data is saved therein.
When the pipe finishes, all bad data rejections are collated in a single log file.