Nipoppy: Parkinson's Progression Markers Initiative dataset

This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset using the Nipoppy framework.

Unless otherwise specified, instructions assume the current working directory is the Nipoppy root directory.

PPMI CSV files to download from LONI

Note

The global_config.json's "CUSTOM" field points to these files. They should be downloaded into sourcedata/tabular.
Files downloaded from LONI have a timestamp for the date. Instead of changing their names, we can create symlinks pointing to the latest version of the file:

# assume we have old file:   idaSearch.csv -> idaSearch_8_15_2024.csv
# and we want to replace by: idaSearch.csv -> idaSearch_10_15_2025.csv

# remove the symlink
rm -i idaSearch.csv

# create the new symlink
ln -s idaSearch_10_15_2025.csv idaSearch.csv

Image collections

idaSearch.csv
- Advanced Search
- Check every box in "Display in result" column
- Check "DTI" + "MRI" + "fMRI" in "Modality"

Study data

Study Docs: Data & Databases
- Code_List_-__Annotated_.csv
- Data_Dictionary_-__Annotated_.csv
Subject Characteristics: Patient Status
- Participant_Status.csv
Subject Characteristics: Subject Demographics
- Age_at_visit.csv
- Demographics.csv
- Socio-Economics.csv
Medical History: Medical
- Clinical_Diagnosis.csv
- Primary_Clinical_Diagnosis.csv
Motor Assessments: Motor / MDS-UPDRS
- MDS-UPDRS_Part_I.csv
- MDS-UPDRS_Part_III.csv
- MDS_UPDRS_Part_II__Patient_Questionnaire.csv
- MDS-UPDRS_Part_I_Patient_Questionnaire.csv
- MDS-UPDRS_Part_IV__Motor_Complications.csv
Non-motor Assessments: ALL
- All downloaded, though not all used or up-to-date
- Benton_Judgement_of_Line_Orientation.csv
- Clock_Drawing.csv
- Cognitive_Categorization.csv
- Cognitive_Change.csv
- Epworth_Sleepiness_Scale.csv
- Geriatric_Depression_Scale__Short_Version_.csv
- Hopkins_Verbal_Learning_Test_-_Revised.csv
- Letter_-_Number_Sequencing.csv
- Lexical_Fluency.csv
- Modified_Boston_Naming_Test.csv
- Modified_Semantic_Fluency.csv
- Montreal_Cognitive_Assessment__MoCA_.csv
- Neuro_QoL__Cognition_Function_-_Short_Form.csv
- Neuro_QoL__Communication_-_Short_Form.csv
- QUIP-Current-Short.csv
- REM_Sleep_Behavior_Disorder_Questionnaire.csv
- SCOPA-AUT.csv
- State-Trait_Anxiety_Inventory.csv
- Symbol_Digit_Modalities_Test.csv
- Trail_Making_A_and_B.csv
- University_of_Pennsylvania_Smell_Identification_Test_UPSIT.csv

Manifest generation

Requires idaSearch.csv as well as demographics/assessments files listed above.

Run:

./code/scripts/curation/generate_manifest.py --dataset . --regenerate

This creates the manifest file and the DICOM directory mapping.

DICOM download

Get mapping of BIDS datatype type to PPMI MRI protocol names

Requires idaSearch.csv.

Run:

./code/scripts/curation/filter_image_descriptions.py --overwrite

This will update two files (which are both tracked by Git):

code/imaging_descriptions/ppmi_imaging_descriptions.json: mapping of BIDS datatype (and suffix) to protocol names
code/imaging_descriptions/ppmi_imaging_ignored.csv: protocol names that will not be included in the dataset

These two files should be checked to ensure that any modification makes sense. If not, see log messages from filter_image_descriptions.py for instructions on how to update the filters in code/nipoppy_ppmi/imaging_filters.py.

Get lists of LONI image IDs to download

Requires:

idaSearch.csv
code/imaging_descriptions/ppmi_imaging_descriptions.json generated by previous step
Updated <NIPOPPY_ROOT>/sourcedata/imaging/doughnut.tsv

Run:

# regenerate the doughnut file if needed
nipoppy doughnut --dataset . --regenerate

# change --session-id and --chunk-size as needed
./code/scripts/dicom_reorg/fetch_dicom_downloads.py --dataset . --session-id BL --chunk-size 1000

Download images from LONI

Follow instructions in log of previous step for creating a collection.

If needed, use utility script code/scripts/dicom_reorg/download_from_loni.sh to download directly to Compute Canada from the computer that initiated the LONI download (i.e. laptop)

Need access to Compute Canada robot node for non-2FA SSH connections, see here.
After initiating a download from LONI, download the CSV file with the download URLs and pass this to download_from_loni.sh

Example command:

# this script should be copied to and run from local computer
# make sure to update the session ID as appropriate
./download_from_loni.sh <PATH_TO_CSV_FILE> <USERNAME>@robot.rorqual.alliancecan.ca <NIPOPPY_ROOT>/sourcedata/imaging/downloads/ses-BL

After the downloads are complete, rename the files to something like 240924_ses1_list12_dataset-{INDEX}.zip if needed (will make unzipping easier).

Unzip downloaded images

Can use utility job script code/scripts/dicom_reorg/unzip_loni_downloads.sh. Example job submission command:

sbatch --array=1-10 --account=rrg-jbpoline --output=<NIPOPPY_ROOT>/logs/hpc/unzip_loni_downloads_%A_%a.out <NIPOPPY_ROOT>/code/scrip
ts/dicom_reorg/unzip_loni_downloads.sh <NIPOPPY_ROOT>/sourcedata/imaging/downloads/ses-V06/250401-sesV06-list4_dataset.zip <NIPOPPY_ROOT>/sourcedata/imaging/pre_reorg/ses-V06

Move subdirectories out of parent PPMI directory and delete PPMI directory.

Then regenerate the doughnut file:

nipoppy doughnut --dataset . --regenerate

DICOM reorg

Using custom script. This step is quite slow so need to do it in a Slurm job. Sample submission command:

sbatch --account=rrg-jbpoline --time=10:00:00 --mem=1G --job-name=ppmi_reorg --output="<NIPOPPY_ROOT>/logs/hpc/%x_%j.out" --wrap="<ACTIVATE_NIPOPPY_PPMI_ENVIRONMENT> && <NIPOPPY_ROOT>/scripts/dicom_reorg/dicom_reorg.py --dataset <NIPOPPY_ROOT>"

Create SquashFS archive

Sample commands:

DATASET_ROOT=`realpath <NIPOPPY_ROOT>`; DPATH_SQUASH=$DATASET_ROOT/sourcedata/imaging/squash; FPATH_SQUASH=$DPATH_SQUASH/ses-BL/250401-sesBL-list1.squashfs; DPATH_LOGS=$DATASET_ROOT/logs/hpc; DPATH_CODE=$DATASET_ROOT/code/scripts/dicom_reorg; sbatch --account=rrg-jbpoline --output=$DPATH_LOGS/slurm-%j.out $DPATH_CODE/make_squash.sh --exclude $DPATH_CODE/exclude.txt --move /ppmi $FPATH_SQUASH $DATASET_ROOT/dicom $DATASET_ROOT/scratch

BIDS conversion

Get participants/sessions to run

Sample command:

nipoppy bidsify --dataset <NIPOPPY_ROOT> --pipeline heudiconv --pipeline-step convert --session-id BL --write-list <NIPOPPY_ROOT>/code/slurm/to_run/tmp.tsv

Run Heudiconv stage 1

Modify bids_conversion.sh appropriately (IMPORTANT: set --pipeline-step to "prepare")

sbatch <NIPOPPY_ROOT>/code/slurm/bids_conversion.sh

Heudiconv testing

cd <NIPOPPY_ROOT>/code/nipoppy_ppmi/
rm -rf fake_bids/
/<NIPOPPY_ROOT>/code/nipoppy_ppmi/heuristic.py --dataset <NIPOPPY_ROOT> --session-id BL

Make sure the above script does not produce errors.

Rename .heudiconv directory in <NIPOPPY_ROOT>/bids

Run Heudiconv stage 2

IMPORTANT: set --pipeline-step to "convert"

sbatch <NIPOPPY_ROOT>/code/slurm/bids_conversion.sh

Clean up `.heudiconv` directories

rename
tar (inside bids/ directory: tar -czvf .heudiconv-<EXTRA>.tar.gz .heudiconv-<EXTRA>/)
delete (IMPORTANT: only do this if SquashFS creation is done!)

Update doughnut with BIDS data

nipoppy doughnut <NIPOPPY_ROOT> --regenerate

Fix DWI data

./code/scripts/curation/add_bval_bvec_to_B0_dwi.py --dataset <NIPOPPY_ROOT> --session-id BL

Other (older) notes

PPMI data portal (LONI IDA)

Some search fields in LONI search tool cannot be trusted
- Examples:
  - Modality
    - Modality=DTI can have anatomical images, and there are diffusion images with MRI modality
  - Weighting (under Imaging Protocol)
    - Some T1s have Weighting=PD
- We classify image modalities/contrast only based on the Image Description column
  - This can also lead to issues, for example when a subject has the same description string for all of their scans. In that case, we manually determine the image modality/contrast and hard-code the mapping in heuristic.py for HeuDiConv
LONI viewer sometimes shows seemingly bad/corrupted files but they are actually fine once we convert them
- Observed for some diffusion images (tend to have ~2700 slices according to the LONI image viewer)

Compute Canada

Some subjects have a huge amount of small DICOM files, which causes us to exceed the inode quota on /scratch
- We opted to create SquashFS archives/filesystems, which count as 1 inode and can be mounted as a filesystem in Singularity container (using the --overlay argument). This is similar to how McGill/NeuroHub stores UK Biobank data on Compute Canada

BIDS

BIDS data file naming

The tabular/ppmi_imaging_descriptions.json file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.

Here is a description of the available BIDS data and the tags that can appear in their filenames:

anat
- The available suffixes are: T1w, T2w, T2starw, and FLAIR
- Most images have an acq tag:
  - Non-neuromelanin images: acq-<plane><type>, where
    - <plane> is one of: sag, ax, or cor (for sagittal, axial, or coronal scans respectively)
    - <type> is one of: 2D, or 3D
  - Neuromelanin images: acq-NM
- For some images, the acquisition plane (sag/ax/cor) or type (2D/3D) cannot be easily obtained. In those cases, the filename will not contain an acq tag.
dwi
- All imaging files have the dwi suffix.
- Most images have a dir tag corresponding to the phase-encoding direction. This is one of: LR, RL, AP, or PA
- Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a dir tag.
- Some participants have multi-shell sequences for their diffusion data. These files will have an additional acq-B<value> tag, where value is the b-value for that sequence.

Currently, only structural (anat) and diffusion (dwi) MRI data are supported. Functional (func) data has not been converted to the BIDS format yet.

HeuDiConv errors

Not solved yet

AttributeError: 'Dataset' object has no attribute 'StackID'
- Vincent previously had the same issue, unclear if/how it was fixed. Error could be because the images are in a single big DICOM instead of many small DICOM files
AssertionError: Conflicting study identifiers found
- Could be because all of a subject's DICOMs are pooled together in the dicom_org step, in which case this can be fixed by manually running HeuDiConv for each image
numpy.AxisError: axis 1 is out of bounds for array of dimension 1
- Only happened for one image so far
- See nipy/heudiconv#670 and nipy/nibabel#1245
AssertionError (assert HEUDICONV_VERSION_JSON_KEY not in json_)
- Thrown by HeuDiConv
AssertionError: we do expect some files since it was called (assert bids_files, "we do expect some files since it was called")
- Thrown by HeuDiConv

Notes on `dwi` data

Some subjects only have a single diffusion image (e.g., Ax DTI), might not be usable
Some subjects have 2 diffusion images, but they have the same description string (e.g., DTI_gated)
- Checked some cases after BIDS conversion, and the JSON sidecars seem to have the same PhaseEncodingDirection (j-)
Some subjects have multi-shell sequences. Their files seem to follow the following pattern:
- dir-PA: 1 B0, 1 B700, 1 B1000, and 1 B2000 image
- dir-AP: 4 B0 images
Some (~2 for ses-BL) subjects have dir-AP for all their diffusion images
- Seem to have 4 dir-AP B0 images and 4 other dir-AP images (according to their description string)
Some diffusion images do not contain raw data, but rather tensor model results (FA, ADC, TRACEW). Some of these have been excluded before BIDS conversion, but not all of them

Name		Name	Last commit message	Last commit date
Latest commit History 516 Commits
.github/workflows		.github/workflows
imaging_descriptions		imaging_descriptions
nipoppy_ppmi		nipoppy_ppmi
scripts		scripts
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.readthedocs.yaml		.readthedocs.yaml
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Nipoppy: Parkinson's Progression Markers Initiative dataset

PPMI CSV files to download from LONI

Manifest generation

DICOM download

Get mapping of BIDS datatype type to PPMI MRI protocol names

Get lists of LONI image IDs to download

Download images from LONI

Unzip downloaded images

DICOM reorg

Create SquashFS archive

BIDS conversion

Get participants/sessions to run

Run Heudiconv stage 1

Heudiconv testing

Run Heudiconv stage 2

Clean up `.heudiconv` directories

Update doughnut with BIDS data

Fix DWI data

Other (older) notes

PPMI data portal (LONI IDA)

Compute Canada

BIDS

BIDS data file naming

HeuDiConv errors

Not solved yet

Notes on `dwi` data

About

Uh oh!

Releases

Packages

Languages

License

neurodatascience/nipoppy-ppmi

Folders and files

Latest commit

History

Repository files navigation

Nipoppy: Parkinson's Progression Markers Initiative dataset

PPMI CSV files to download from LONI

Manifest generation

DICOM download

Get mapping of BIDS datatype type to PPMI MRI protocol names

Get lists of LONI image IDs to download

Download images from LONI

Unzip downloaded images

DICOM reorg

Create SquashFS archive

BIDS conversion

Get participants/sessions to run

Run Heudiconv stage 1

Heudiconv testing

Run Heudiconv stage 2

Clean up .heudiconv directories

Update doughnut with BIDS data

Fix DWI data

Other (older) notes

PPMI data portal (LONI IDA)

Compute Canada

BIDS

BIDS data file naming

HeuDiConv errors

Not solved yet

Notes on dwi data

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Clean up `.heudiconv` directories

Notes on `dwi` data

Packages