Ensemble Learning and Decoding

Code repository for the paper "Across-subject ensemble-learning alleviates the need for large samples for fMRI decoding" by Himanshu Aggarwal, Liza Al-Shikhley and Bertrand Thirion.

A preprint version of this work can be found here: https://hal.science/hal-04636523

Setup

Clone the repository

git clone git@github.com:man-shu/ensemble-fmri.git

Download the data

Download downsampled 3mm event-wise GLM effect-size maps of fMRI datasets. It is only ~7GB in size. Yes it is indeed small

cd ensemble-fmri
wget https://zenodo.org/records/12204275/files/data.zip
unzip data.zip -d data

Install the dependencies

Create a new conda environment with the dependencies.

conda env create -f env/ensemblepy.yml
conda activate ensemblepy

Optional dependencies

For calculating feature importance scores

We used a conditional permutation importance method to calculate the importance scores as provided in this package and explained in this paper. I have modified the code to work with the ensemble setting. To install the package, run:

git clone git@github.com:man-shu/Variable_Importance.git
cd Variable_Importance/BBI_package
pip install .
cd ../..

For comparison with the Graph Convolutional Network (GCN)

We also compared the ensemble approach with the conventional Graph Convolutional Network (GCN) classifier. The GCN implementation used is inspired by this tutorial

It needs the following dependencies:

pip install torch torch-geometric

Prepare your data

All the scripts expect event-wise GLM effect size maps in the data directory. The data structure should be as follows:

data/neuromod
└── 3mm
    ├── sub-01.nii.gz
    ├── sub-01_labels.csv
    ├── sub-01_runs.csv
    ├── sub-02.nii.gz
    ├── sub-02_labels.csv
    ├── sub-02_runs.csv
    ├── sub-03.nii.gz
    ├── sub-03_labels.csv
    ├── sub-03_runs.csv
    ├── sub-05.nii.gz
    ├── sub-05_labels.csv
    └── sub-05_runs.csv

Under data, you need to create a sub-directory with the name of your dataset e.g. neuromod in this case.
This sub-directory should contain another sub-directory named as resolution of the data e.g. 3mm in this case. Note that we downsampled all our data to 3mm resolution because it is computationally expensive to work with the full resolution data.
The 3mm directory should contain the effect size maps for each subject. The effect size maps should be in the nifti format. The filenames should be in the format <subject_id>.nii.gz e.g. sub-01.nii.gz in this case. The subject_id in itself could be anything as long as it is unique for each file. Each volume in this 4D nifti file should correspond to an event in the task. All runs should be concatenated in this one nifti file.
The 3mm directory should also contain the labels for each subject. The labels should be in a CSV file. The filenames should be in the format <subject_id>_labels.csv e.g. sub-01_labels.csv in this case. The CSV file should have one column without any header. The column should contain the labels for each event/volume in nifti file.
Finally, you just need to add the name of your dataset (the sub-directory you just created under data) in all the scripts.

For example, in scripts/vary_train_size.py, you need to add your dataset's name in the datas list as follows:
```
# in lines 42-48
datas = [
    "neuromod",
    "forrest",
    "rsvp",
    "bold",
    "aomic_anticipation",
    "your_dataset_name"
]
```
You can even remove the other datasets if you are only interested in your dataset.

Section Reproduce the results in the paper explains what each script does and how to run them.

Computing event-wise GLM effect size maps

If you want to know how to get event-wise GLM effect size maps, you can follow this tutorial. Note that the method to compute event-wise GLM effect size maps is referred to as Least Squares Separation (LSS) in the tutorial.

We did the same for the AOMIC dataset using the scripts/glm/glm_anticipation.py script. So once you have downloaded the AOMIC dataset from here you can update the paths in the script and run it to get the event-wise GLM effect size maps. Note that you only need the files with the *task-anticipation* wildcard in the filename from the AOMIC dataset.

Reproduce the results in the paper

Run main experiments

To generate numbers plotted in Fig 2 and Fig 3 (over varying training sizes) in the paper:

using the data in the data directory, saving the results in the results directory, with 20 parallel jobs, and DiFuMo features, run:
```
python scripts/vary_train_size.py data results 20 difumo
```

or with full-voxel space features, run:

python scripts/vary_train_size.py data results 20 voxels

To generate numbers plotted in Fig 4 (over varying numbers of subjects in the ensemble) in the paper:

using the data in the data directory, saving the results in the results directory, with 20 parallel jobs, and DiFuMo features, run:
```
python scripts/vary_n_subs.py data results 20 difumo
```

or with full-voxel space features, run:

python scripts/vary_n_subs.py data results 20 voxels

Optional experiments

Feature importance scores

For computing the importance scores for the ensemble setting using the DiFuMo features and RandomForest classifier, plotted in Supplementary Fig. 2 in the paper:

python scripts/feat_imp.py data results 20

Pretraining L1 penalized Linear SVC

To use the pretraining strategy with L1 penalized Linear SVC (the results plotted in Supplementary Figure 2), you can change line 175 under the pretrain function in utils.py from:

clf = LinearSVC(dual="auto")

to:

clf = LinearSVC(dual="auto", penalty="l1")

Comparison with GCN

For comparing the ensemble approach with the GCN approach, run:

python scripts/compare_with_gcn.py data results 20 gcn/param_grid.json

Plot the results

To plot Figures 2 and 3 in the paper, run:

python scripts/plotting/plot_fig2_fig3.py

Figure 2:

Figure 3:

To plot Figure 4 in the paper, run:

python scripts/plotting/plot_fig4.py

Figure 4:

To plot the feature importance scores, run the following command. This will plot feature importance scores on surfaces and glass brains for each subject in each dataset. The plots will be saved in the plots/feat_imp directory.

python scripts/plotting/plot_suppfig1.py

The plots will look like this:

To plot the comparison with GCN, run the following command. This will create a similar plot as in Fig 2 but with the GCN results.

python scripts/plotting/plot_gcn.py

Time taken

We ran the experiments on a CPU-based cluster with 72 nodes and 376 GB of RAM, but only used 20 parallel jobs. The OS was Ubuntu 18.04.6 LTS.

The time taken for each experiment is as follows:

# command
time python scripts/vary_train_size.py data results 20 difumo  

# output
264026.23s user 4271.26s system 1606% cpu 4:38:23.80 total

# command
time python scripts/vary_train_size.py data results 20 voxels  

# output
541518.33s user 20061.83s system 1715% cpu 9:05:44.40 total

# command
time python scripts/vary_n_subs.py data results 20 difumo 

# output
264768.95s user 644.23s system 1804% cpu 4:05:10.79 total

# command
time python scripts/vary_n_subs.py data results 20 voxels 

# output
361548.14s user 5086.45s system 1828% cpu 5:34:16.29 total

# command
time python scripts/feat_imp.py data results 20  

# output
2596918.56s user 65902.53s system 2082% cpu 35:31:34.59 total

Install conda

If you don't have conda installed, you can install it from here.

Issues

In case you face any issues, please feel free to open an issue on this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 138 Commits
.deprecated		.deprecated
env		env
gcn		gcn
plots		plots
scripts		scripts
.gitignore		.gitignore
README.md		README.md
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Ensemble Learning and Decoding

Setup

Clone the repository

Download the data

Install the dependencies

Optional dependencies

For calculating feature importance scores

For comparison with the Graph Convolutional Network (GCN)

Prepare your data

Computing event-wise GLM effect size maps

Reproduce the results in the paper

Run main experiments

Optional experiments

Feature importance scores

Pretraining L1 penalized Linear SVC

Comparison with GCN

Plot the results

Time taken

Install conda

Issues

About

Uh oh!

Releases

Packages

Uh oh!

Languages

man-shu/ensemble-fmri

Folders and files

Latest commit

History

Repository files navigation

Ensemble Learning and Decoding

Setup

Clone the repository

Download the data

Install the dependencies

Optional dependencies

For calculating feature importance scores

For comparison with the Graph Convolutional Network (GCN)

Prepare your data

Computing event-wise GLM effect size maps

Reproduce the results in the paper

Run main experiments

Optional experiments

Feature importance scores

Pretraining L1 penalized Linear SVC

Comparison with GCN

Plot the results

Time taken

Install conda

Issues

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages