giavasis2015-QAP-paper/qap_methods.tex at master · preprocessed-connectomes-project/giavasis2015-QAP-paper · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
\section{Methods}
\label{sec:2}

\subsection{Quality Measures}
\label{sec:measures}

The QAP toolbox includes a variety of metrics that have been proposed in the literature for measuring spatial and temporal aspects of structural and functional neuroimaging data. The goal has been to make the toolbox comprehensive even though many of the measures may be highly correlated. Measures from the literature that are explicitly defined for phantom data and are not appropriate for in vivo data, such as signal-to-noise-fluctuation ratio (also known as temporal signal to noise ratio) \citep{Friedman2006}, have been excluded along with measures such as noise distribution analysis (QI2) that are computationally expensive with marginal sensitivity to quality \citep{mortamet2009}.  QAP currently only includes measures for structural and functional MRI data, measures for other imaging modalities like diffusion MRI will be added in the future.

\subsubsection{Measures of spatial quality}
\label{sec:spat_qual}

\paragraph{Contrast-to-Noise Ratio (CNR)}
\label{sec:CNR}
CNR can be defined in many different ways depending on the purpose of the images being collected. Since structural MRI data is most commonly used for morphometric measurements and calculating tissue specific maps for downstream processing,  QAP focuses on the contrast between white matter and grey matter. CNR is therefore calculated as the difference between the mean white matter signal ($\widebar{WM}$)  and the mean gray matter signal ($\widebar{GM}$) divided by the standard deviation of the image background ($\sigma_{b}$) (see Eqn. \ref{eqn:CNR}) \citep{magnotta2006}.

\begin{equation}
\label{eqn:CNR}
CNR = \frac{\widebar{WM} - \widebar{GM}} {\sigma_{b}}
\end{equation}

CNR should correspond to the ability to discern anatomical features from the image and provides a measure of how easily the image can be segmented. It is sensitive to the imaging parameters used to acquire the data, as well as, the amount of thermal noise, artifacts, and head motion present in the image. CNR is only calculated for structural MRI data, and the greater this value is, the better.

\paragraph{Entropy Focus Criterion (EFC)}
\label{sec:EFC}
EFC is the Shannon entropy present across voxel intensities, which is maximized when the voxel intensity histogram is spread evenly across all intensities and is minimized when voxels all have the same intensity.  Head-motion induced image blurring and ghosting cause background voxels that would otherwise be zero to have a brighter intensity. The resulting spread of the voxel intensity histogram will result in greater entropy. Hence, this measure can be used to approximate the degree of motion-related artifacts in an image \citep{atkinson1997}. EFC is computed using equation \ref{eqn:EFC}:

\begin{equation}
\label{eqn:EFC}
EFC = - \sum_{n=1}^{N} \frac{V_n}{V_{max}} \ln{\Bigg(\frac{V_n}{V_{max}}\Bigg)} \\
\end{equation}

\noindent with $N$ being the number of image voxels, and $V_n$ being the intensity of the $n^{th}$ voxel. $V_{max}$ is proportional to the standard deviation of voxel intensities and is defined in equation \ref{eqn:EFC_max}.\\

\begin{equation}
\label{eqn:EFC_max}
V_{max} = \sqrt{\sum_{n=1}^{N} V_{n}^{2}} \\
\end{equation}

The maximum value of EFC is determined by the number of voxels in the image (equation \ref{eqn:EFC}), therefore we divide EFC by the maximum to make the result comparable across images of different sizes.

\begin{equation}
\label{eqn:EFCnorm}
EFC_{max} = \sqrt{N}\ln{(\sqrt{N})} \\
\end{equation}

EFC is calculated for both anatomical and functional data, and the closer to zero this number is, the better.

\paragraph{Foreground-to-Background Energy Ratio (FBER)}
\label{sec:FBER}
Foreground-to-Background Energy Ratio, or FBER, is the ratio between the energy (normalized variance) of the signal in voxels in the head and the energy of the signal in the background (the air). This provides a measure of how much of the signal is contributed by motion-related or scanner-related artifacts. In an ideal brain image all of the energy should be contained in the foreground. Head motion, thermal noise, ghosting, and other artifacts will increase the energy in an image's background and will reduce this value. FBER is calculated for both anatomical and functional data, and larger values of FBER are better.

\begin{equation}
FBER = \frac{\displaystyle\frac{1}{|F|}\sum_{f \in F} V_f^{2}}{\displaystyle\frac{1}{|B|}\sum_{b \in B} V_b^{2}}
\end{equation}

where $F$ is the set of voxels in foreground and $B$ is the set of voxels in background.

\paragraph{Full-Width Half Maximum (FWHM)}
\label{sec:FWHM}

FWHM is a measure of the the spatial smoothness - the degree of spatial correlation - in the imaging data. The spatial smoothness in a particular direction (x, y, or z) is estimated from the ratio of the variance of the first spatial difference image along the direction to the variance of the unperturbed image. The FWHM in each of the three directions are estimated and then combined by the geometric mean using AFNI's \texttt{3dFWHMx} tool. Since this value varies by voxel size, it is normalized by the geometric mean of the voxel dimensions to render it comparable across different acquisition parameters. Head motion and technical factors will blur the spatial details of an image and increase the spatial smoothness. Since image smoothness is bound by the voxel size, the minimum value of this value should be 1, and values closer to this minimum are better.

\paragraph{Percentage of artifact voxels (Qi1)}
\label{sec:Qi1}

Since thermal noise should not exhibit spatial correlation, any spatial structure in the background is assumed to reflect artifacts such as ghosts and RF banding. The percentage of artifact voxels, or the Quality Index (QI1, \cite{mortamet2009}), is a measure of the proportion of voxels in the background that contain artifacts to the number of voxels in the background (the air). Artifacts in background regions that are not proximal to the brain are not considered critical to image quality and are excluded from the calculation.

Artifact voxels are identified from the background using the following procedure \cite{mortamet2009}:

\begin{enumerate}
    \item A background mask is calculated from the inverse of a head mask and further constrained to voxels that are superior to an oblique plane that connects the bottom of the forehead to the nape of the neck
    \item The mode of background voxel intensities is used as a threshold to exclude low intensity noise values from consideration
    \item A modified morphological opening operation that consists of an erosion using a 3D cross followed by a dilation is applied to the remaining voxels to remove unconnected voxels (i.e. those that are not adjacent to other supra-threshold voxels)
    \item Remaining voxels are labeled as belonging to artifact
\end{enumerate}

QI1 is the percentange of background voxels ($V_B$) that are classified as artifacts ($V_A$), (\ref{eqn:QI1}).

\begin{equation}
\label{eqn:QI1}
QI1 = \frac{|V_A|}{|V_B|} \\
\end{equation}

QI1 is calculated only on structural data, and the closer this number is to zero, the better.

\paragraph{Signal-to-Noise Ratio (SNR)}
\label{sec:SNR}
Signal-to-Noise Ratio is a ubiquitous measure of how well image features of interest are differentiable from obscuring variation. The definition of signal and noise can vary based on the intent for the images being evaluated, but for structural MRI it is defined as the mean of a homogeneous region of image over the variation in a region of the background. To simplify the automatic calculation of this measure, it is calculated in QAP as the ratio between the mean of the gray matter signal ($\widebar{GM}$) and the standard deviation of the signal intensity of all background voxels ($\sigma_{b}$)  (\cite{magnotta2006}):
\begin{equation}
SNR = \frac{\widebar{GM}} {\sigma_{b}}
\end{equation}
SNR is calculated for both anatomical and functional data, and the greater this number is, the better.

\paragraph{Ghost-to-Signal Ratio (GSR)}
\label{sec:GSR}
Ghosts in MRI images can arise from a variety of technical sources, such as gradient calibration and sequence parameters, as well as patient motion. For fMRI data, the ghosts appear in the phase encoding direction making them easy to locate and measure. GSR is the difference between the mean voxel intensity of background regions where ghosts are likely to occur ($\widebar{V_G}$) and the mean voxel intensity of the remainder of background voxels ($\widebar{V_B}$), divided by the mean voxel intensity from within the brain ($\widebar{V_F}$) (\citep{giannelli2010}):

\begin{equation}
GSR_{j} = \frac{\widebar{V_G} - \widebar{V_B}}{\widebar{V_F}}
\end{equation}

GSR is only calculated on functional data, and the closer this value is to zero this value is, the better.

\subsubsection{Measures of temporal quality}
\label{sec:temp_qual}
\todo{add change to these metrics, inclusion of Mean/Std.Dev, etc.}

\paragraph{Mean Standardized DVARS}
\label{sec:dvars}
Head motion and other scanner instabilities may induce large global intensity variations (spikes) in fMRI time series. DVARS measures these variations from the spatial standard deviation of frame-to-frame difference images  \citep{power2012}. DVARS in its original form is effected by the temporal autocorrelation in the data, which makes it impossible to compare its values meaningfully between different acquisition strategies and sites. Standardized DVARS accounts for this autocorrelation, resulting in a more absolute measure of DVARS that is comparable between datasets (\citep{Nichols2013}):

\begin{equation}
DVARS = \frac{1}{P}\sum_{p}^P{\frac{\sqrt{\frac{1}{N} \sum_{n} (V_{n,p} - V_{n,p - 1})^2}}{\sqrt{\frac{1}{N} \sum_{n} 2(1 - \rho_{n})\sigma_{n}^2}}}
\end{equation}
$V_{n,p}$ being the intensity of the $p^{th}$ observation of the $n^{th}$ voxel, $N$ being the total number of voxels and $P$ being the number of observations (TRs), $\sigma_{n}^2$ is temporal variance, and $\rho_{n}$ is the one-lag temporal auto-correlation.  The closer this value is to zero, the better.

\paragraph{Outlier Detection}
\label{sec:13}
Similar to DVARS, the goal of outlier detection, defined by AFNI's \texttt{3dToutcount}, is to quantify the presence of spikes in the fMRI time series. This measure is defined as the fraction of outlier voxels per volume averaged across the fMRI time series. Voxels are determined to be outliers if their intensity exceeds a threshold defined by:
\begin{equation} \label{outlier_eqn}
\sqrt{\frac{\pi}{2}} \Phi^{-1}\left(\frac{0.001}{P}\right)  MAD
\end{equation}
where $\Phi^{-1}$ is the inverse of the reversed Gaussian cumulative distribution function and $P$ is the total number of observations (time points). The mean absolute deviation ($MAD$) of the voxel time series is calculated from:
\begin{equation} \label{MAD_eqn}
MAD = {med}_p(\lvert V_{n,p} - \widetilde{V_n} \rvert )
\end{equation}
where ${med}_p$ is the median operator over observations, $\widetilde{V_n}$ is the median observation of voxel $n$, and $V_{n,p}$ is the $p^{th}$ observation of voxel ${n}$. The closer this value is to zero, the better.

\paragraph{Median Quality Index (MQI)}
\label{sec:15}
The Median Quality Index, as defined by AFNI's \texttt{3dTqual} tool, is a multivariate analog of the previously described outlier detection. A quality index for each fMRI volume is calculated from one minus its spatial Spearman's rank correlation with the median volume:
\begin{equation} \label{QI_eqn}
    {QI}_p=1-\frac{1}{N-1}\sum_{n=1}^{N}\frac{(R_{n,p}-\widebar{R_p})(\widetilde{R_{n}}-\widebar{\widetilde{R}})}{\sigma_{R_p}\sigma_{\widetilde{R}}},
\end{equation}
where $R_{n,p}$ is the rank of the $n^{th}$ voxel in the $p^{th}$ volume, $\widebar{R_p}$ is the mean rank of all voxels in the $p^{th}$ volume, $\widetilde{R_n}$ is the rank of the $n^{th}$ voxel in the median volume, $\widebar{\widetilde{R}}$ is the mean rank of voxels in the median volume, and $\sigma_{R_p}$ and $\sigma_{\widetilde{R}}$ are the standard deviation of the ranks of voxels in the $p^{th}$ and median volumes respectively.  MQI ($\widetilde{QI}$) is the median quality index across the volumes of the fMRI dataset. This value varies between zero and two, and the closer to zero this value is, the better.

\paragraph{Quality Index (QI) Percent Outliers}
This is the percent of total volumes whose quality index (described in \ref{sec:15}) are statistical outliers as determined by the criterion:
\begin{equation} \label{QI_percent}
    {QI}_p\ge \widetilde{QI} \pm MAD({QI})
\end{equation}
where $\widetilde{QI}$ is the median quality index (described in \ref{sec:15}) and $MAD({QI})$ is the median absolute deviation of the quality indices of all volumes in the dataset and is calculated similar to Eqn. \ref{MAD_eqn}. The fewer the number of outliers (the smaller the percentage), the better.

\paragraph{Global Correlation (GCOR)}
\label{sec:14}
Certain types of nuisance variation in fMRI data will increase the correlation between voxel time series. This is particularly true for head motion and physiological signals (respiration and heart beat) and can be exaggerated by their interaction with other imaging parameters (e.g. parallel imaging, slice gap, multi-band imaging) \citep{saad2013}. GCOR ($\gamma$) quantifies this effect by the average correlation between every pair of in-brain voxels in the fMRI data:
\begin{equation}
    \gamma=\frac{1}{N^2}\sum_{m=1}^{N}\sum_{n=1}^{N}\rho_{m,n}
\end{equation}
where $N$ is the number of voxels and $\rho_{m,n}$ is the Pearson's correlation between the $P$ length time series for the $n^{th}$ and $m^{th}$ voxels $V_n$ and $V_m$ respectively, each having mean $\widebar{V_n}$ ($\widebar{V_m}$) and standard deviation $\sigma_{n}$ ($\sigma_{m}$):
\begin{equation}
    \frac{1}{P-1}\sum_{p=1}^{P}\frac{(V_{n,p}-\widebar{V_n})(V_{m,p}-\widebar{V_m})}{\sigma_{n}\sigma_{m}}.
\end{equation}
Lower values of GCOR are preferred.


\paragraph{Mean Root Mean Square Deviation (MeanRMSD)}
\label{sec:16}
Root mean square deviation (RMSD) is a measure of frame-to-frame motion in an fMRI time series that summarizes the three translations (x, y, z) roll ($\alpha$), pitch ($\beta$), and yaw ($\gamma$) estimated from the linear co-registration between each volume of the fMRI time series and a reference volume \citep{Jenkinson99FD}. A $4\times4$ transform matrix $T_t$ that maps volume $t$ to the reference volume can be calculated from these six parameters using:
\begin{equation}
    T_t = \begin{bmatrix}
          1 & 0 & 0 & x\\[-3mm]
          0 & 1 & 0 & y\\[-3mm]
          0 & 0 & 1 & z\\[-3mm]
          0 & 0 & 0 & 1
     \end{bmatrix}
     \begin{bmatrix}
          1 & 0 & 0 & 0\\[-3mm]
          0 & \cos\alpha & \sin\alpha & 0\\[-3mm]
          0 & -\sin\alpha & \cos\alpha & 0\\[-3mm]
          0 & 0 & 0 & 1
     \end{bmatrix}
     \begin{bmatrix}
          \cos\beta & 0 & \sin\beta & 0\\[-3mm]
          0 & 1 & 0 & 0 \\[-3mm]
          -\sin\beta & 0 & \cos\beta & 0\\[-3mm]
          0 & 0  & 0 & 1
     \end{bmatrix}
     \begin{bmatrix}
          \cos\gamma & \sin\gamma & 0 & 0\\[-3mm]
          -\sin\gamma & \cos\gamma & 0 & 0\\[-3mm]
          0 & 0 & 1 & 0\\[-3mm]
          0 & 0 & 0 & 1
     \end{bmatrix}.
\end{equation} Using these transforms, the distance $M_{t,t+1}$ between volume $t$ and volume $t+1$ can be decomposed into a $3\times3$ matrix $A$ and $3\times1$ vector $b$:
\begin{equation}
        T_{t+1}T_{t}^{-1}-I = \begin{bmatrix}
           & A & & b \\[-3mm]
          0 & 0 & 0 & 0
     \end{bmatrix}.
\end{equation} With these and the center of the volume  $c$, the root mean square deviation between the two frames is calculated from \citep[using the notation in][]{Yan2013}:
\begin{equation}
    RMSD = \sqrt{\frac{1}{5}R^2Tr[A^{T}A]+(b+Ac)^T(b+Ac)},
\end{equation}
where $R$ is the radius of the head, which is assumed to be 80mm. MeanRMSD is the mean RMSD calculated for each consecutive pair of volumes in an fMRI series. We chose MeanRMSD over other measures of frame-to-frame motion \citep[e.g.][]{power2012,VanDijk2012} due to our previous observations that it is a more accurate estimate of the motion seen at the voxel level \citep{Yan2013}. The closer to zero this measure is, the better.


\subsection{The QAP Python Toolbox}
\label{sec:tooblox}

The processing procedures required by QAP, such as image segmentation, registration, and mask generation are all accomplished using components from AFNI \citep{cox1996}. These tools are pipelined together with QAP-specific python modules using Nipype \citep{gorgolewski_2016_50186} to simplify the processing of very large datasets using high performance computing system such as multicore workstations and clusters. Other benefits of Nipype include provenance tracking and a mechanism that enables a pipeline to be restarted after a change in configuration or error and only recompute the affected pipeline steps. The amount of processing required for calculating the metrics has been minimized so that they focus on data quality rather than the quality of the algorithms used to perform the processing. But since some processing (e.g., segmentation, alignment, and masking) is unavoidable we recommend that the same algorithms be used on all data for which QA measures will be compared.

\subsubsection{System requirements and installation}

QAP can run on any system that supports Python along with the AFNI neuroimaging toolset. This currently includes most *nix platforms and Mac OS X. QAP can run on fairly modest workstations and also supports parallel execution on multicore workstations and Sun Grid Engine \citep{Gentzsch2001}, Condor \citep{condor2005}, PBS \citep{pbs2001}, and Slurm \citep{slurm2002} based clusters. QAP can be installed using standard python package installation tools (e.g. pip) and is available preinstalled on a free-to-use Amazon Machine Image. Extensive documentation on installing and using QAP are available at its webpage\footnote{\url{http://preprocessed-connectomes-project.github.io/quality-assessment-protocol/\#installing-the-qap-package}}.
\ccnotes{If it is ready for that, we could mention the docker image here}


\ccnotes{here a sentence to connect to the next section: why we need a pipeline to produce the intermediate results on which these measures are computed}

Additionally, a ``Resource of QAP measures'' calculated for the ABIDE and CoRR datasets is available on the Preprocessed Connectomes Project's main page (link here?) as a spreadsheet matching the participant IDs with their scans' appropriate quality measures. \ccnotes{OE: isnt this last paragraph a result itself?}


\subsubsection{Pipeline execution}

In Nipype terminology, the QAP pipelines are composed of nodes that represent different AFNI tools or Python functions that implement the required image processing steps and calculate the quality measures. Nodes are connected together by their respective input and output files, most of which are intermediary files that can be deleted after the calculation has completed without affecting the outputs. Nodes are executed in a working directory, which contains log files that provide more details on the processing that occurred and any intermediary files that are produced generated. Once execution has completed the final results are copied to an output directory, and the working directories can be deleted. If QAP is restarted and the working directory already exists (e.g. has not been deleted), it will use any already-computed intermediaries that are in the directory, rather than recomputing them. This enables a warm-restart capability to allow the user to recover from an error or reconfiguration without having to recompute all of the processing steps.

QAP has been designed to avoid unnecessary dependencies between different datatypes; estimating QAP on functional data does not require any information from an anatomical scan acquired during the same session and likewise temporal and spatial quality assessment on the functional data are kept separate. Although this may limit some of the processing that can be performed, it was done to avoid contaminating the estimated quality of functional data with poor quality anatomical data. Applying this criterion allowed the QAP to be broken into three different pipelines: \texttt{qap\_anatomical\_spatial.py} for calculating spatial measures on anatomical scans, \texttt{qap\_functional\_spatial.py} for calculating spatial measures on functional (EPI) scans, and \texttt{qap\_functional\_temporal.py} for calculating temporal measures on functional scans.

Each pipeline script requires a dataset list and a configuration file in the easy-to-construct Python YAML format. The same dataset list can be used for all three scripts and contains file paths to the input data to be assessed. At the very least, this list must contain paths to the raw anatomical or functional data, but may also contain intermediary files that will be used to bypass their corresponding pipeline nodes. In this way, the user can override any of the processing steps in the pipeline with precomputed data from their preferred implementations. QAP includes scripts which can auto-generate dataset lists from a directory or data arranged in the BIDS format \citep{gorgolewski_brain_2016}. Currently QAP only supports data in compressed or uncompressed NIfTI files \citep{nifti2004}. The configuration file contains a small collection of settings for the pipeline, such as the location of the working and output directories, configuring parallel execution, and the path to needed template files.

Once a script is executed, a pipeline builder is invoked to dynamically construct a Nipype pipeline using information in the dataset list and configuration files. A resource pool is populated with all of the files specified in the data list along with any files that are found in the working directory. The pipeline is then constructed from the bottom up by first looking for a needed intermediary in the resource pool, if it is not found then the pipeline node that generates the file is inserted, and then its inputs are searched for, and so on. The resulting pipeline is submitted to a Nipype execution scheduler that tracks the data dependencies between pipeline steps to identify those that can be run in parallel. By combining the processing of multiple datasets into a single pipeline, QAP takes advantage of parallelizability available both between processing steps performed on different datasets and within steps performed on the same dataset to maximize throughput. The scheduler takes into account estimates of the amount of memory and processors in use and estimates of the resources required by ready-to-run pipeline steps when making scheduling decisions to avoid overloading the system's resources.

When using a cluster, QAP separates the data list into jobs that are submitted to the scheduler such that a separate instance of the QAP script is run on each cluster node. This ensures that all of the processing for a dataset occurs on the same node, minimizing the data transfer between nodes to improve computation speed. When running on a cluster we recommend to use data storage that is local to each cluster node, rather than a network share, for the working directory. This will reduces delays due to network transfers and competition between cluster nodes for access to the same resource. A disadvantage of this is that it makes it harder to use the "warm restart" functionality of QAP, but we have found that the pipelines are quick enough that this is not that high of a penalty compared to the aforementioned network costs. Once a pipeline is completed the output statistics can be copied back to shared storage.

The output of the QAP scripts is a CSV file per dataset that includes the calculated measures for that dataset. A separate script is provided to combine  CSVs across datasets.

\subsubsection{Pipeline steps - Anatomical}
\label{sec:21}
The anatomical spatial pipeline uses AFNI's \texttt{3drefit} to deoblique the raw anatomical images, and AFNI's \texttt{3dresample} to reorient the deobliqued image to RPI orientation. This resulting image is used throughout the QAP spatial anatomical metric calculations as the source of raw anatomical data.

The pipeline uses \texttt{3dSkullStrip} and \texttt{3dCalc} to remove the skull from the deobliqued, reoriented anatomical scan, providing a brain-only image to be used in tissue segmentation, performed using \texttt{3dSeg}. This generates gray matter, white matter, and CSF masks for use in the calculation of CNR and Cortical Contrast.

A whole-head mask is created from the deobliqued, reoriented anatomical scan using \texttt{3dCalc}, with an appropriate threshold selected by \texttt{3dClipLevel}. A sequence of six dilations and erosions are performed using \texttt{3dmask\_tool} in order to fill in gaps within the mask.

Because some of the spatial metrics rely on the signal intensity in the background of the image, motion artifacts present in the area near and under the participant's mouth and nose can introduce error. In the process of generating an appropriate foreground and background mask of the anatomical data, AFNI's \texttt{3dAllineate} is used to perform linear anatomical registration to a user-specified template image.

The resulting affine matrix is used to calculate and draw a mask segment covering the area of background in the anatomical image directly in front of the participant's mouth and chin. \texttt{3dcalc} is used to combine the whole-head mask with this slice. An inversion of this mask results in the background mask, which is used in the FBER, Qi1, SNR and CNR spatial measures. A "skull-only" mask is also generated by subtracting the slice mask from the whole-head mask, which is used in the FBER calculation.

\subsubsection{Pipeline steps - Functional}
The functional pipelines start with using AFNI's \texttt{3drefit} and \texttt{3dresample} to deoblique and reorient the functional images. Motion correction is then run using \texttt{3dvolreg} to obtain the coordinate transformation, which is used to calculate the Root Mean Square Deviation metric for the functional temporal pipeline.

A functional brain mask is generated using \texttt{3dAutomask}. This mask is used in conjunction with the deobliqued, reoriented functional timeseries to calculate the other functional temporal metrics Outliers, Quality, and Global Correlation (GCOR). This mask is then inverted using \texttt{3dcalc} and used to calculate Out-of-Brain (OOB) Outliers, a measure of intensity spikes residing outside of the brain. Outliers and OOB Outliers are calculated using AFNI's \texttt{3dToutcount}, and Quality is calculated using \texttt{3dTqual}. The mean and standard deviation of each temporal metric is also calculated and reported, along with the median and inter-quartile range (IQR).

The timeseries is also averaged into the one-volume mean functional image for spatial metrics using AFNI's \texttt{3dTstat}. This mean functional image is used with the functional brain mask to calculate EFC, FWHM, and Ghost-to-Signal Ratio. The background mask, which is an inversion of the functional brain mask, is used with the mean functional image and the original brain mask to calculate FBER.

\subsubsection{Graphical Reports}
\label{sec:21b}

Since no automated quality assessment is perfect, QAP generates a series of graphical reports to optimize the process of eyeballing the images and quickly detect outliers across the different metrics. Two categories of graphical reports are produced: individual reports (one per subject in the data pool) and group reports (showing the distribution of subjects across measures). The individual reports are structured as follows: 1) a section documenting the measures included in the corresponding report, along with an identifier of the subject and description of the number of sessions and runs included; 2) mosaic views of axial slices of images of interest; and 3) violin plots showing the distributions of measures for all subjects, indicating the location of the corresponding subject in each distribution. For the anatomical spatial protocol, the image of interest for the mosaic view is the original T1-weighted image. For the functional reports, the temporal image of interest is the the tSNR map (average of the SNR map of each timepoint) and the spatial image of interest is the averaged signal along time.

\ccnotes{OE: comments on this section\\
1. I would split out a whole data section, including this description of data would be better in a table, where the numbers of subjects per modality (func, anat) are easily identified, along with proper citation (papers or links)\\
2. I would remove the experiments from here (what we evaluated) and place in a section where it is more clear that this is what we are looking at after extracting the metrics.}

\ccnotes{??? says three expert raters here, but says four raters down below in Statistical Analysis !!}

The Preprocessed Connectome’s Project Quality Assessment Protocol was used to calculate spatial and temporal quality measures on the (now 1,101 structural and 1,163 functional \ccnotes{what does this mean??}) 1,113 structural and functional MRI datasets from the ABIDE dataset and the (now 3,112 structural and 4,611 functional scans) 3,357 structural and 5,094 functional scans from the CoRR dataset. For the ABIDE data, quality measures were compared to the quality scores determined from visual inspection by four expert raters to evaluate their predictive value. For both the ABIDE and CoRR datasets, the redundancy between quality measures was evaluated from their correlation matrix. Finally, the test-retest reliability of quality measures derived from CoRR was assessed using intra-class correlation.

\subsection{Methods for assessing the quality of ABIDE data}
\label{sec:23}

\subsection{Statistical Analysis}
\label{sec:24}

\subsubsection{Correlations between measures}
\label{sec:25}
Pearson’s correlation coefficients were computed to assess the relationship between measures for each dataset separately, ABIDE and CoRR, and were summarized into table 1 and figure 2. Note that we considered the first scan of the first session for each participant to compute correlations in CoRR since CoRR consists of multiple scans of multiple sessions for each subject.

\subsubsection{Test-retest of the measures (for CoRR)}
\label{sec:26}
We were able to evaluate the test-retest reliability for each measure using CoRR dataset since CoRR has multiple scans of multiple sessions for each subject. The intra-class correlations (ICC: \cite{Shrout79}) have been computed for each site by
\begin{equation}
ICC = \frac{MS_b - MS_w} {MS_b + MS_w}
\end{equation}
where $MS_b$ is the between-subject mean square and $MS_w$ is the within-subject mean square for each measure.


%%% ?? says 4 manual inspection raters here, but 3 up above in the beginning of Methods!
\subsubsection{Relationship of measures with hand assessments}
\label{sec:27}
Manually applied structural data quality sources are available for ABIDE. Hence, we evaluated the relationship of measures with hand assessments. Four individual reviewers scored each image. When three or more reviewers agreed that the quality of the image is okay, we considered the image as a ``success'' (or pass). Then, we applied the logistic regression to hand assessments against quality measures.

\subsubsection{Relationship of measures with scanning parameters}
The associations between quality measures and scanning parameters, with respect to all measures of interest, were based on mixed effects models analyses (Diggle et al., 2013). Scanning parameters include type of scanner, slice gap, slice acquisition, flip angle, TE, TR, slice thickness, voxel area, duration of scan $N$ (timepoints). Scanner, slice gap and slice acquisition were considered factors with multi-level. Random site effects were included in the models since observations are nested in sites.