Skip to content

neurodatascience/nipoppy-ppmi

 
 

Repository files navigation

Nipoppy: Parkinson's Progression Markers Initiative dataset

This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset using the Nipoppy framework.

Unless otherwise specified, instructions assume the current working directory is the Nipoppy root directory.

PPMI CSV files to download from LONI

Note

  • The global_config.json's "CUSTOM" field points to these files. They should be downloaded into sourcedata/tabular.
  • Files downloaded from LONI have a timestamp for the date. Instead of changing their names, we can create symlinks pointing to the latest version of the file:
# assume we have old file:   idaSearch.csv -> idaSearch_8_15_2024.csv
# and we want to replace by: idaSearch.csv -> idaSearch_10_15_2025.csv

# remove the symlink
rm -i idaSearch.csv

# create the new symlink
ln -s idaSearch_10_15_2025.csv idaSearch.csv

Image collections

  • idaSearch.csv
    • Advanced Search
    • Check every box in "Display in result" column
    • Check "DTI" + "MRI" + "fMRI" in "Modality"

Study data

  • Study Docs: Data & Databases
    • Code_List_-__Annotated_.csv
    • Data_Dictionary_-__Annotated_.csv
  • Subject Characteristics: Patient Status
    • Participant_Status.csv
  • Subject Characteristics: Subject Demographics
    • Age_at_visit.csv
    • Demographics.csv
    • Socio-Economics.csv
  • Medical History: Medical
    • Clinical_Diagnosis.csv
    • Primary_Clinical_Diagnosis.csv
  • Motor Assessments: Motor / MDS-UPDRS
    • MDS-UPDRS_Part_I.csv
    • MDS-UPDRS_Part_III.csv
    • MDS_UPDRS_Part_II__Patient_Questionnaire.csv
    • MDS-UPDRS_Part_I_Patient_Questionnaire.csv
    • MDS-UPDRS_Part_IV__Motor_Complications.csv
  • Non-motor Assessments: ALL
    • All downloaded, though not all used or up-to-date
    • Benton_Judgement_of_Line_Orientation.csv
    • Clock_Drawing.csv
    • Cognitive_Categorization.csv
    • Cognitive_Change.csv
    • Epworth_Sleepiness_Scale.csv
    • Geriatric_Depression_Scale__Short_Version_.csv
    • Hopkins_Verbal_Learning_Test_-_Revised.csv
    • Letter_-_Number_Sequencing.csv
    • Lexical_Fluency.csv
    • Modified_Boston_Naming_Test.csv
    • Modified_Semantic_Fluency.csv
    • Montreal_Cognitive_Assessment__MoCA_.csv
    • Neuro_QoL__Cognition_Function_-_Short_Form.csv
    • Neuro_QoL__Communication_-_Short_Form.csv
    • QUIP-Current-Short.csv
    • REM_Sleep_Behavior_Disorder_Questionnaire.csv
    • SCOPA-AUT.csv
    • State-Trait_Anxiety_Inventory.csv
    • Symbol_Digit_Modalities_Test.csv
    • Trail_Making_A_and_B.csv
    • University_of_Pennsylvania_Smell_Identification_Test_UPSIT.csv

Manifest generation

Requires idaSearch.csv as well as demographics/assessments files listed above.

Run:

./code/scripts/curation/generate_manifest.py --dataset . --regenerate

This creates the manifest file and the DICOM directory mapping.

DICOM download

Get mapping of BIDS datatype type to PPMI MRI protocol names

Requires idaSearch.csv.

Run:

./code/scripts/curation/filter_image_descriptions.py --overwrite

This will update two files (which are both tracked by Git):

  • code/imaging_descriptions/ppmi_imaging_descriptions.json: mapping of BIDS datatype (and suffix) to protocol names
  • code/imaging_descriptions/ppmi_imaging_ignored.csv: protocol names that will not be included in the dataset

These two files should be checked to ensure that any modification makes sense. If not, see log messages from filter_image_descriptions.py for instructions on how to update the filters in code/nipoppy_ppmi/imaging_filters.py.

Get lists of LONI image IDs to download

Requires:

  • idaSearch.csv
  • code/imaging_descriptions/ppmi_imaging_descriptions.json generated by previous step
  • Updated <NIPOPPY_ROOT>/sourcedata/imaging/doughnut.tsv

Run:

# regenerate the doughnut file if needed
nipoppy doughnut --dataset . --regenerate

# change --session-id and --chunk-size as needed
./code/scripts/dicom_reorg/fetch_dicom_downloads.py --dataset . --session-id BL --chunk-size 1000

Download images from LONI

Follow instructions in log of previous step for creating a collection.

If needed, use utility script code/scripts/dicom_reorg/download_from_loni.sh to download directly to Compute Canada from the computer that initiated the LONI download (i.e. laptop)

  • Need access to Compute Canada robot node for non-2FA SSH connections, see here.
  • After initiating a download from LONI, download the CSV file with the download URLs and pass this to download_from_loni.sh

Example command:

# this script should be copied to and run from local computer
# make sure to update the session ID as appropriate
./download_from_loni.sh <PATH_TO_CSV_FILE> <USERNAME>@robot.rorqual.alliancecan.ca <NIPOPPY_ROOT>/sourcedata/imaging/downloads/ses-BL

After the downloads are complete, rename the files to something like 240924_ses1_list12_dataset-{INDEX}.zip if needed (will make unzipping easier).

Unzip downloaded images

Can use utility job script code/scripts/dicom_reorg/unzip_loni_downloads.sh. Example job submission command:

sbatch --array=1-10 --account=rrg-jbpoline --output=<NIPOPPY_ROOT>/logs/hpc/unzip_loni_downloads_%A_%a.out <NIPOPPY_ROOT>/code/scrip
ts/dicom_reorg/unzip_loni_downloads.sh <NIPOPPY_ROOT>/sourcedata/imaging/downloads/ses-V06/250401-sesV06-list4_dataset.zip <NIPOPPY_ROOT>/sourcedata/imaging/pre_reorg/ses-V06

Move subdirectories out of parent PPMI directory and delete PPMI directory.

Then regenerate the doughnut file:

nipoppy doughnut --dataset . --regenerate

DICOM reorg

Using custom script. This step is quite slow so need to do it in a Slurm job. Sample submission command:

sbatch --account=rrg-jbpoline --time=10:00:00 --mem=1G --job-name=ppmi_reorg --output="<NIPOPPY_ROOT>/logs/hpc/%x_%j.out" --wrap="<ACTIVATE_NIPOPPY_PPMI_ENVIRONMENT> && <NIPOPPY_ROOT>/scripts/dicom_reorg/dicom_reorg.py --dataset <NIPOPPY_ROOT>"

Create SquashFS archive

Sample commands:

DATASET_ROOT=`realpath <NIPOPPY_ROOT>`; DPATH_SQUASH=$DATASET_ROOT/sourcedata/imaging/squash; FPATH_SQUASH=$DPATH_SQUASH/ses-BL/250401-sesBL-list1.squashfs; DPATH_LOGS=$DATASET_ROOT/logs/hpc; DPATH_CODE=$DATASET_ROOT/code/scripts/dicom_reorg; sbatch --account=rrg-jbpoline --output=$DPATH_LOGS/slurm-%j.out $DPATH_CODE/make_squash.sh --exclude $DPATH_CODE/exclude.txt --move /ppmi $FPATH_SQUASH $DATASET_ROOT/dicom $DATASET_ROOT/scratch

BIDS conversion

Get participants/sessions to run

Sample command:

nipoppy bidsify --dataset <NIPOPPY_ROOT> --pipeline heudiconv --pipeline-step convert --session-id BL --write-list <NIPOPPY_ROOT>/code/slurm/to_run/tmp.tsv

Run Heudiconv stage 1

Modify bids_conversion.sh appropriately (IMPORTANT: set --pipeline-step to "prepare")

sbatch <NIPOPPY_ROOT>/code/slurm/bids_conversion.sh

Heudiconv testing

cd <NIPOPPY_ROOT>/code/nipoppy_ppmi/
rm -rf fake_bids/
/<NIPOPPY_ROOT>/code/nipoppy_ppmi/heuristic.py --dataset <NIPOPPY_ROOT> --session-id BL

Make sure the above script does not produce errors.

Rename .heudiconv directory in <NIPOPPY_ROOT>/bids

Run Heudiconv stage 2

IMPORTANT: set --pipeline-step to "convert"

sbatch <NIPOPPY_ROOT>/code/slurm/bids_conversion.sh

Clean up .heudiconv directories

  • rename
  • tar (inside bids/ directory: tar -czvf .heudiconv-<EXTRA>.tar.gz .heudiconv-<EXTRA>/)
  • delete (IMPORTANT: only do this if SquashFS creation is done!)

Update doughnut with BIDS data

nipoppy doughnut <NIPOPPY_ROOT> --regenerate

Fix DWI data

./code/scripts/curation/add_bval_bvec_to_B0_dwi.py --dataset <NIPOPPY_ROOT> --session-id BL

Other (older) notes

PPMI data portal (LONI IDA)

  • Some search fields in LONI search tool cannot be trusted
    • Examples:
      • Modality
        • Modality=DTI can have anatomical images, and there are diffusion images with MRI modality
      • Weighting (under Imaging Protocol)
        • Some T1s have Weighting=PD
    • We classify image modalities/contrast only based on the Image Description column
      • This can also lead to issues, for example when a subject has the same description string for all of their scans. In that case, we manually determine the image modality/contrast and hard-code the mapping in heuristic.py for HeuDiConv
  • LONI viewer sometimes shows seemingly bad/corrupted files but they are actually fine once we convert them
    • Observed for some diffusion images (tend to have ~2700 slices according to the LONI image viewer)

Compute Canada

  • Some subjects have a huge amount of small DICOM files, which causes us to exceed the inode quota on /scratch

BIDS

BIDS data file naming

The tabular/ppmi_imaging_descriptions.json file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.

Here is a description of the available BIDS data and the tags that can appear in their filenames:

  • anat
    • The available suffixes are: T1w, T2w, T2starw, and FLAIR
    • Most images have an acq tag:
      • Non-neuromelanin images: acq-<plane><type>, where
        • <plane> is one of: sag, ax, or cor (for sagittal, axial, or coronal scans respectively)
        • <type> is one of: 2D, or 3D
      • Neuromelanin images: acq-NM
    • For some images, the acquisition plane (sag/ax/cor) or type (2D/3D) cannot be easily obtained. In those cases, the filename will not contain an acq tag.
  • dwi
    • All imaging files have the dwi suffix.
    • Most images have a dir tag corresponding to the phase-encoding direction. This is one of: LR, RL, AP, or PA
    • Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a dir tag.
    • Some participants have multi-shell sequences for their diffusion data. These files will have an additional acq-B<value> tag, where value is the b-value for that sequence.

Currently, only structural (anat) and diffusion (dwi) MRI data are supported. Functional (func) data has not been converted to the BIDS format yet.

HeuDiConv errors

Not solved yet

  • AttributeError: 'Dataset' object has no attribute 'StackID'
  • AssertionError: Conflicting study identifiers found
    • Could be because all of a subject's DICOMs are pooled together in the dicom_org step, in which case this can be fixed by manually running HeuDiConv for each image
  • numpy.AxisError: axis 1 is out of bounds for array of dimension 1
  • AssertionError (assert HEUDICONV_VERSION_JSON_KEY not in json_)
    • Thrown by HeuDiConv
  • AssertionError: we do expect some files since it was called (assert bids_files, "we do expect some files since it was called")
    • Thrown by HeuDiConv

Notes on dwi data

  • Some subjects only have a single diffusion image (e.g., Ax DTI), might not be usable
  • Some subjects have 2 diffusion images, but they have the same description string (e.g., DTI_gated)
    • Checked some cases after BIDS conversion, and the JSON sidecars seem to have the same PhaseEncodingDirection (j-)
  • Some subjects have multi-shell sequences. Their files seem to follow the following pattern:
    • dir-PA: 1 B0, 1 B700, 1 B1000, and 1 B2000 image
    • dir-AP: 4 B0 images
  • Some (~2 for ses-BL) subjects have dir-AP for all their diffusion images
    • Seem to have 4 dir-AP B0 images and 4 other dir-AP images (according to their description string)
  • Some diffusion images do not contain raw data, but rather tensor model results (FA, ADC, TRACEW). Some of these have been excluded before BIDS conversion, but not all of them

About

Process long and prosper

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 92.5%
  • Shell 7.5%