The code in this repository is what we used fit the VDGLM to the 1200 subject release of the Human Connectome Project (HCP) on the UCI High Performance Cluster (HPC). Note that this code is not intended to be production level code and will require considerable hacking to run.
Our data files and a description of our preprocessing pipelines can be found on our Open Science Foundation page.
Our analyses begin with data from the Minimal Preprocessing Pipeline. We run
out anlaysis on 875 subjects that had low head motion. From the original
grayordinate data, we extracted the time series for 333 surface regions of
interest (ROIs) based on the Gordon et al. atlas. Additionally, we
performed scrubbing and motion correction. The data from the working memory
task are stored in the file tc_WM_875subj.mat.mat. This file contains the
following fields:
motionX: estimated mean head motion for each subject and run
readme: short readme
subjs: subject ids
tasks: run ids
tc: timecourse for each subject and run. We will use run 1 (tc(:,1)) for analysis
The files design_WM_LR.mat and combdesign_WM_LR.mat contain the
corresponding uncombined (10 conditions) and combined (4 conditions) design
matrices, respectively. Each file contains the fields:
conditions: condition labels
rst: a structure that contains variables describing the time series. rst.mtx is the design matrix.
The main files we use for
analysis are tc_WM_875subj.mat which contains the imaging time course and
design_WM_LR.mat and combdesign_WM_LR.mat, which contain the
uncombined design matrix (10 conditions) and combined design matrix (4
conditions), respectively.
This code will run the analyses. The directory structure of the code is as follows:
code: main directoryoptimization: contains VDGLM objective function, gradients, and constraintplotting: functions for plotting resultsstats: functions for computing statisticsbatchmode: functions for fitting models in batch modeutils: utility functionsHPC_scripts: scripts for running jobs on the HPC (these must be located on the HPC)
I would recommend running code on a desktop machine to save the hassle of sending code to and from the HPC and compiling MATLAB code.
First download the data and design from our OSF webpage. All code can be found in the code directory and its subdirectories.
You will have to edit directory paths for the following files:
* add_all_paths.m: change the path to a NIFTI installation
* batchmode/set_analysis_options.m: change the paths to the time course and the
design matrix
* batchmode/set_results_directory: specify the paths where you would like to save
model results, generated images, and ROI brain files
* plotting/wb_global_variables.bash: change the MAIN_FILE_DIRECTORY to be
equal to the directory where you store ROI brain files, change the IMAGE_DIRECTORY
to the same image directory specified in set_results_directory.m
Next run the function add_all_paths.m, which will add the important
code subdirectories to your path.
Run the following functions in order:
* batchmode/run_subjects_in_parallel: this file will perform one analysis
for each subject in parallel. We use whsim=26 in the paper. Use the parameter
jobtype to indicate whether you want to run the standard VDGLM vs. GLM analyses
or if you would like to simulate data from a null model and then run the analyses.
* batchmode/combine_results: this file will join all the results into group level
files.
* stats/compute_cohens_d: this function will compute group-level Cohen's d for
all contrasts specified in the paper. Requires that the results for the analyze
jobtype have been computed and combined.
* stats/compute_color_bounds: this function will compute the appropriate color
bounds and create color bars for the Cohen's d values just computed (colorbars
are saved in the image directory). Requires taht the results for the analyze
jobtype have been computed and combined.
* plotting/plot_simulations: this will plot the scatter plot and any figures
used to check our analysis (model predictions, area plots, e.g. non-brain images)
. Requires that the results for the analyze jobtype have been computed and
combined.
* plot_null_best: this function will plot the VDGLM preference for the
real data compared to the preference for the simulated data. Requires that the
null jobtype results have been computed and combined.
Brain images can be plotted in a semi-automated way, or manually. All the files this section instructs you how to create are already included in the repo for analysis we ran in the paper.
We use bash scripts to format each image for plotting. Given a scene file, we
can automatically print images using wb_command -show-scene, however this
function does not work for Windows. Thus, on windows, perform the following:
1) Download workbench from HCP. Follow instructions provided with workbench
to use it via the command line
2) Download Cygwin so that you can run the bash scripts. All the following
commands will be run in Cygwin
3) Get rid of all MAC newlines so that windows bash can run the scripts:
cd plotting
sed -i 's/\r$//' format_wb_scripts_for_windows.bash
4) There are a few important files for how to format images:
* wb_cont_format(_mdl2).bash: these scripts will format images for
wb_view on a non-fixed scale (_mdl2 is GLM):
* wb_cont_format_fixed(_mdl2).bash: these scripts will format images
for wb_view on a fixed scale (_mdl2 is GLM )
These files can be fun as follows: source wb_cont_format.bash 0.2 to
format an image with a non-fixed color scheme with a Cohen’s d threshold of 0.2
. Each script reads the cifti data from a file of the form
cohensd_whs{x}_whmodel{y}{str}.dscalar.nii where x indicates the simulation
number and y indicates the model (1=VDGLM, 2=GLM). The way the file is
formatted for visualization is determined by the script. The output produced by
each script is a workbench spec file with naming convention
contrasts_{str}.spec where str=:
* '':indicates the VDGLM results with a non-fixed scale.
* fixed: indicates the VDGLM with a fixed scale.
* mdl2: indicates the GLM with a non-fixed scale.
* fixed_mdl2: indicates the GLM with a fixed scale.
To visualize results (e.g., contrasts.spec) type the following:
wb_view contrasts.spec. Images can be saved using File/Capture Image.
The cifti files used to create the spec files
are stored in the directory ROI2NIfTI/files.
If there is a problem running the bash scripts for formatting images,
you can also manually create each image. To do this, open the files:
* ROI2NIfTI/files/cohensd_whs{x}_whmodel{y}{str}.dscalar.nii
* ROI2NIfTI/files/S1200.R.pial_MSMAll.32k_fs_LR.surf.gii
* ROI2NIfTI/files/S1200.L.pial_MSMAll.32k_fs_LR.surf.gii
* ROI2NIfTI/files/Gordon333.32k_fs_LR.dlabel.nii
and set the viewing options manually.(the wrench in workbench viewer).
For non-fixed images, we use the
following parameters:
* PALETTE_MODE="MODE_AUTO_SCALE_PERCENTAGE"
* PALETTE_NAME="FSL"
* POS_MIN=4.00
* POS_MAX=96.00
* NEG_MIN=2.00
* NEG_MAX=98.00
* DISP_NEG=true
* DISP_POS=true
* DISP_ZERO=0
* THRESH_TYPE="THRESHOLD_TYPE_NORMAL"
* THRESH_TEST="THRESHOLD_TEST_SHOW_OUTSIDE"
For fixed images, we use the following parameters:
* PALETTE_MODE="MODE_USER_SCALE"
* PALETTE_NAME="FSL"
* POS_MIN_USER=0.00
* NEG_MAX_USER=0.00
* DISP_NEG=true
* DISP_POS=true
* DISP_ZERO=0
* THRESH_TYPE="THRESHOLD_TYPE_NORMAL"
* THRESH_TEST="THRESHOLD_TEST_SHOW_OUTSIDE"
Images can be saved using File/Capture Image.
To plot the effect size image, visualize the files:
* ROI2NIfTI/files/S1200.R.pial_MSMAll.32k_fs_LR.surf.gii
* ROI2NIfTI/files/S1200.L.pial_MSMAll.32k_fs_LR.surf.gii
* ROI2NIfTI/files/Gordon333.32k_fs_LR.dlabel.nii
* ROI2NIfTI/files/effects_cohensd_small_whs26_whmodel1.dscalar.nii
* ROI2NIfTI/files/effects_cohensd_medium_whs26_whmodel1.dscalar.nii
* ROI2NIfTI/files/effects_cohensd_large_whs26_whmodel1.dscalar.nii
Use the JET256 colormap on a fixed scale with min of 1 and max of 3.
Before trying any examples, make sure to set the paths by running
add_all_paths.m.
Here is the example output from running the function
set_analysis_options.m with the parameters whsim=26, isHPC=0,
dotest=0, and LOG=LOG = log4m.getLogger('test_log.txt'):
opts =
struct with fields:
whsim: 26
whs: 2
K: 10
seed: 1
Tremove: 10
doconstrained: 1
prewhiten: 1
var_log_transform: 0
TukN: 5
multivariate: 0
roifile: '/Users/Garren/Dropbox/FMRI/restingstatedata/tc_WM_875subj.mat'
designfile: '/Users/Garren/Dropbox/FMRI/Projects/varianceGLM/onsetfiles/behav_Result_WM_LR.mat'
combdesignfile: '/Users/Garren/Dropbox/FMRI/Projects/varianceGLM/onsetfiles/behav_Result_WM_LR_comb.mat'
runmodels: {4×1 cell}
output_directory: '/Users/Garren/Dropbox/FMRI/Projects/varianceGLM/Results/batch_analyses/single_jobs'
addmotion: 1
opts.runmodels contains information on the 4 models we are going to fit.
opts.runmodels{m}{1} contains the mean design columns for model m.
opts.runmodels{m}{2} contains the variance design columns, and
opts.runmodels{m}{3} contains the model description such that
opts.runmodels{m}{3}{1} contains the model name and
opts.rummodels{m}{3}{2} contains the model name of the model that model m
is dependent on (if applicable).
The variance models are hard coded to depend on the output of a corresponding mean model in order to speed up fitting time. This change is motivated by how prewhitening works. The prewhitening procedure works as follows:
- Fit GLM0
- prewhiten (using residuals of GLM0)
- Fit GLM1 on prewhitened data
This code adds the additional VDGLM fitting step so that the whole procedure is now:
- Fit GLM0
- prewhiten (using residuals of GLM0)
- Fit GLM1 on prewhitened data
- Fit VDGLM0
Encoding dependencies allows us to fit the mean models using OLS (which is faster than our optimization procedure). Model dependencies are hard-coded in the third entry of each entry of run_models. The entry is a cell of length two with the model name in the first cell element and the model that must run before the current model in the second cell elements. e.g. {'Var+Mean', 'Mean'} indicates that the 'Mean' model must have been run for the 'Var+Mean' to run. No dependency is indicated by an empty string.
To fit subject 1 in test mode (test mode only supports subjects 1-5), run the following command:
analyzedata_batch_v2(26, 1, 0, 1, 1, 'INFO', '')
The command line output will be:
INFO:Running Simulation 26
INFO: whsim: 26.000000
INFO: whs: 2
INFO: K: 10.000000
INFO: seed: 1
INFO: Tremove: 10
INFO: doconstrained: 1
INFO: prewhiten: 1
INFO: var_log_transform: 0
INFO: TukN: 5.000000
INFO: multivariate: 0
INFO: roifile: /Users/Garren/Dropbox/FMRI/restingstatedata/tc_WM_875subj.mat
INFO: designfile: /Users/Garren/Dropbox/FMRI/Projects/varianceGLM/onsetfiles/behav_Result_WM_LR.mat
INFO: combdesignfile: /Users/Garren/Dropbox/FMRI/Projects/varianceGLM/onsetfiles/behav_Result_WM_LR_comb.mat
INFO: output_directory: /Users/Garren/Dropbox/FMRI/Projects/varianceGLM/Results/batch_analyses/single_jobs_test
INFO: addmotion: 1
INFO: runmodels:
INFO: model: Var+Mean, depends on: Mean
INFO: model: Mean, depends on:
INFO: model: Var, depends on: Intercept
INFO: model: Intercept, depends on:
INFO:Loading data: /Users/Garren/Dropbox/FMRI/restingstatedata/tc_WM_875subj.mat
INFO:Converting data and transforming to z-scores
INFO:Creating convolved design matrix for all experimental variables
INFO: dotest = 1
INFO:Creating design for model 1 of 4 (Var+Mean)
INFO:Creating design for model 2 of 4 (Mean)
INFO:Creating design for model 3 of 4 (Var)
INFO:Creating design for model 4 of 4 (Intercept)
INFO:Fitting models ..... working on subject 1 of [1:1]
INFO:Total Run Time = 8.70
INFO:saving file /Users/Garren/Dropbox/FMRI/Projects/varianceGLM/Results/batch_analyses/single_jobs_test/paramswm_whs26_batch_1_to_1_test
If we load an output file (e.g., iparamswm_whs26_batch_1_to_1_test), the
following variables will be loaded:
allbicm [NS x R x P]: contains the BIC scores forNSsubjects,Rregions, andPparametersallllsm [NS x R x P]: contains the log likelihood scoresbestmodelBIC [NS x R]: contains the best models for each subject and region compute by BIC. The number indexes into the cell arraymodels(e.g,bestmodelBIC(s,r) = 1, corresponds tomodels{1})bestmodelCV [NS x R]: contains the best model for each subject and region computed by out-of-sample log likelihoodM: number of modelsK: number of cross validation foldssavedwhsim: inicates which analysis was just savedmodels: a cell array containing information about each model. The fields in models are:code: lists which model we are running and which columns are in the mean and variance designmeaneffects: lists the mean effects in textvareffects: lists the variance effects in textdescription: text description of the modelmeancols: lists which columns of the full design are part of the mean designvarcols: lists which columns of the full design are part of the variance designdesignlabels: labels of the design matrixinits: initial values for the optimizationparamlabels: cell of parameter descriptionswhsim: analysis numberismeanparam: index into which parameters correspond to the meanisvarparam: index into which parameters correspond to the varianceisinterceptparam: index into which parameters are interceptsallparams: all parameter values [NS x P x R]allpredsm: all mean predictions [T x S x R]allpredsv: all variance predictions [T x S x R]
For example:
models{1}
ans =
struct with fields:
code: {{1×4 cell} {1×2 cell} 'Var+Mean+Fix+Ins'}
meaneffects: {'Intercept' '2-back' 'Instruction' 'Fixation'}
vareffects: {'Intercept' '2-back'}
description: 'Var+Mean+Fix+Ins'
meancols: [1 3 4 5]
varcols: [1 3]
designlabels: {'Intercept' '2-back' 'Instruction' 'Fixation'}
inits: [0 0.1000 0.1000 0.1000 1 0.1000]
paramlabels: {6×1 cell}
whsim: 2
ismeanparam: [1 1 1 1 0 0]
isvarparam: [0 0 0 0 1 1]
isinterceptparam: [1 0 0 0 1 0]
allparams: [142×6×299 double]
allpredsm: [395×142×299 single]
allpredsv: [395×142×299 single]
The directories on my local machine are:
-
Results: contains all resultssingle_analyses: results for analyses run all at oncebatch_analyses: results for analyses run in batchsingle_jobs: contains results from single jobscombined: contains results combined from all single jobs resultsnull_single_jobs: contains results from single null sampling jobsnull_combined: contains results combined from all single null sampling jobs
-
images: contains figures and imagescontasts: volumetric analysis contrastswb_contrasts: work bench contrasts (CIfTI)wb_effects: work bench effect indicators
-
ROI2NIfTI: contains nii files and functions for converting between vectors and niifiles: nii files used to create images inimages\contrastsandimages\wb_contrastseffects: nii effect indicators used to create images inimages\wb_effects
dicm2nii: contains code for converting between dicm and nii
Most of these are excluded from the git repo to keep the size of the repo
maneageable. You will need to make stand in directories for some of the
output to save. Results and images and their subdirectories will need to
exist for functions to run.
The code can be downloaded to the HPC by cloning this git repo. Then send the data to the HPC as follows:
the following commands:
source code/batchmode/send_data_to_HPC.bash
Compile the application from the 'code/batchmode' directory using the following command:
mcc -m analyzedata_batch.m -I /data/users/ggaut/VDGLM/code/batchmode -I /data/users/ggaut/VDGLM/code/utils -I /data/users/ggaut/VDGLM/code/optimization
mcc -m null_sample_hypothesis_test.m -I /data/users/ggaut/VDGLM/code/batchmode -I /data/users/ggaut/VDGLM/code/utils -I /data/users/ggaut/VDGLM/code/optimization
The main functions for submitting jobs on the HPC are
code/HPC_scripts/submit_all_jobs.bash and
code/HPC_scripts/single_job_submit.bash. To submit jobs, cd to the directory
/data/users/ggaut/VDGLM/code/batchmode and type the command:
source submit_all_jobs.bash [whsim] [dotest] [isHPC] [start_sub] [increment] [end_sub] [loglevel] [logfile] single_job_submit.bash
This command will run the analysis starting at subject start_sub and ending
at subject end_sub incrementing by increment subjects. E.g., 1 1 5 will
run subject 1,2,3,4,5 and 1 2 5 will run subjects 1,3,5. The other options
are:
- whsim: the simulation number (26 for CIfTI with prewhitening)
- dotest: are we running in test mode (0/1)
- isHPC: are we running on the HPC (0/1)
- loglevel: one of {ALL, TRACE, DEBUG, INFO, WARN, ERROR, FATAL, OFF}
- logfile: where to save log. is deprecated since I turned logging to file off in the code
The HPC Results directory is contained in /pub/ggaut/VDGLM. The Results
structure will be the same as the directory structure on my local machine.
To copy results from the HPC to my machine, use the rsync command. For
example:
rsync -av [email protected]:/pub/ggaut/VDGLM/Results/batch_analyses/combined .
will copy the combined directory on the HPC to the current directory. The results structure is as follows:
Results: contains all resultssingle_analyses: results for analyses run all at oncebatch_analyses: results for analyses run in batchsingle_jobs: contains results from single jobscombined: contains results combined from all single jobs resultsnull_single_jobs: contains results from single null sampling jobsnull_combined: contains results combined from all single null sampling jobs
The function combine_results.m will combine all results from the directory
Results\batch_analyses\single_jobs into single files and store them in
Results\batch_analyses\combined.
behav_Result_WM_LR.mat contains the uncombined design (10 conditions) and
behav_Result_WM_LR_comb.mat contains the combined design (4 conditions).