Skip to content

Latest commit

 

History

History
221 lines (165 loc) · 6.8 KB

File metadata and controls

221 lines (165 loc) · 6.8 KB

BOLD-Cast: Modeling individual-level long-range brain dynamics from short fMRI scans

BOLD-Cast is a two-stage autoregressive deep learning framework for reconstructing long-range brain dynamics from short fMRI scans.

This repository contains the official PyTorch implementation of the paper:

Modeling individual-level long-range brain dynamics from short fMRI scans


🚀Overview

BOLD-Cast addresses a practical challenge in neuroimaging: how to obtain reliable long-range functional brain dynamics from short resting-state fMRI scans.

The framework consists of two stages:

  • Stage I: a graph-based disentanglement module that separates each fMRI sample into:

    • cohort-invariant embedding
    • subject-specific embedding
  • Stage II: a prompt-based autoregressive forecasting module that combines:

    • graph-derived embeddings
    • raw sequence features
    • timestamp-based prompt embeddings

    and predicts future fMRI signals using a frozen language-model backbone.

The model is trained on parcel-level resting-state fMRI signals parcellated using the Craddock CC200 atlas.


Repository Structure

BOLD-Cast/
├── Stage I/                # Graph-based disentanglement learning
├── Stage II/               # Autoregressive forecasting module
├── dataset/                # User-prepared input data
├── requirements.txt        # Python dependencies
├── README.md               # Project documentation
└── [other files]

System Requirements

Hardware requirements

For small-scale testing or toy examples, BOLD-Cast can be run on a standard workstation.

For full training and reproduction of the main experiments in the manuscript, we recommend:

  • NVIDIA GPU with at least 16–32 GB memory
  • Sufficient CPU RAM for loading preprocessed parcel-level fMRI data
  • Linux/Windows-based environment for large-scale training

The main experiments in the manuscript were run on:

  • NVIDIA V100 GPUs (32 GB memory)

Software requirements

  • NVIDIA GPU with at least 16–32 GB memory
  • Sufficient CPU RAM for loading preprocessed parcel-level fMRI data
  • Linux/Windows-based environment for large-scale training

The main experiments in the manuscript were run on:

  • Window 11
  • Python=3.10
  • Cuda=11.8
  • PyTorch=2.0.1

Python dependencies

Main dependencies are listed in requirements.txt. Typical packages include:

torch
numpy
scipy
scikit-learn
h5py
pandas
matplotlib
transformers

Installation Guide

Install from GitHub

git clone https://github.com/CUHK-AIM-Group/BOLD-Cast.git
cd BOLD-Cast
pip install -r requirements.txt

Typical installation time On a normal desktop or workstation with a stable internet connection, installation typically takes:10–20 minutes.

Downloading pretrained language model checkpoints may require additional time depending on network conditions.

📊Instructions for Use

Input data

BOLD-Cast does not operate directly on raw DICOM or raw NIfTI files.

The expected input to the model is preprocessed parcel-level fMRI time series, obtained after:

  1. standard fMRI preprocessing
  2. parcellation using the Craddock CC200 atlas
  3. construction of sliding-window sequences

For each sliding window:

parcel-wise time series are used as node features
functional connectivity (FC) is computed from Pearson correlation
graph representations are then constructed for Stage I

Running the software on your own data

To use this repository on your own data, you should:

  1. Obtain approved access to the original neuroimaging dataset.
  2. Preprocess the fMRI data into parcel-level time series.
  3. Organize the files in a format compatible with the repository.
  4. Run Stage I to extract disentangled graph embeddings.
  5. Run Stage II to forecast future fMRI dynamics.

Reproduction instructions

For Stage I

  1. Put the datasets under the folder ./dataset/.

  2. Important args:

  • --use_pretrain Test checkpoints in checkpoints
  • --dataset ukb hcp-d hcp-ya hcp-a abide
  • --custom_key Node: node classification
  1. Training
  • `python prepare_data.py
  • `python main.py
  1. Testing
  • `use_pretrain == 'True'

For Stage II

  1. Put the datasets under the folder ./dataset/.

  2. Download the large language models from Hugging Face. The default LLM is GPT2(https://huggingface.co/openai-community/gpt2)

    For example, if you download and put the LLaMA directory successfully, the directory structure is as follows:

    • data_provider
    • dataset
    • gpt2
      • config.json
      • flax_model.msgpack
      • generation_config.json
      • ...
    • ...
    • run.py
  3. Generate timestamp by gpt2, suffixed by {subjectid}.pt,
    python generate_timestamp_ukb.py

  4. Generate timestamp embeddings, and save them along with historical time series, common features, and characteristic features as H5 data, stored in dataset/ukb_input. The training, value, and test datasets need to be generated in three separate batches. python preprocess_ts.py

  5. Training python run.py

  6. Testing

  • `use_pretrain == 'True'

Evaluation

Typical forecasting metrics include:

  • MAE
  • RMSE
  • MAPE
  • mPCC
  • FCPCC
  • FN

Downstream analyses in the manuscript include:

  • ASD classification
  • sex identification
  • cognitive score prediction

✨Comparison

Most of the comparison algorithms have been integrated into the models. Some GNN-based models cannot be integrated and are therefore not included.

Data Availability

The data analyzed in this study are available only for bona fide research purposes and require approval from the corresponding data providers. The datasets used in the manuscript include:

  • UK Biobank
  • ABIDE
  • HCP-Young Adult
  • HCP-Development
  • HCP-Aging

Due to privacy, ethical, and data-use restrictions, these datasets are not redistributed in this repository.

Users must obtain access directly from the corresponding data providers and prepare the data locally before running the code.

Acknowledgement

We appreciate the following GitHub repos a lot for their valuable code and efforts.

Contact

For questions regarding the code or manuscript, please contact: [Yu Jiang] [yuajiang@cuhk.edu.hk] If you use this code in your research, please cite the corresponding paper.