This repository contains code to process tabular and imaging data from the Parkinson's Progression Markers Initiative (PPMI) dataset using the Nipoppy framework.
Unless otherwise specified, instructions assume the current working directory is the Nipoppy root directory.
Note
- The
global_config.json's"CUSTOM"field points to these files. They should be downloaded intosourcedata/tabular. - Files downloaded from LONI have a timestamp for the date. Instead of changing their names, we can create symlinks pointing to the latest version of the file:
# assume we have old file: idaSearch.csv -> idaSearch_8_15_2024.csv
# and we want to replace by: idaSearch.csv -> idaSearch_10_15_2025.csv
# remove the symlink
rm -i idaSearch.csv
# create the new symlink
ln -s idaSearch_10_15_2025.csv idaSearch.csvImage collections
idaSearch.csv- Advanced Search
- Check every box in "Display in result" column
- Check "DTI" + "MRI" + "fMRI" in "Modality"
Study data
- Study Docs: Data & Databases
Code_List_-__Annotated_.csvData_Dictionary_-__Annotated_.csv
- Subject Characteristics: Patient Status
Participant_Status.csv
- Subject Characteristics: Subject Demographics
Age_at_visit.csvDemographics.csvSocio-Economics.csv
- Medical History: Medical
Clinical_Diagnosis.csvPrimary_Clinical_Diagnosis.csv
- Motor Assessments: Motor / MDS-UPDRS
MDS-UPDRS_Part_I.csvMDS-UPDRS_Part_III.csvMDS_UPDRS_Part_II__Patient_Questionnaire.csvMDS-UPDRS_Part_I_Patient_Questionnaire.csvMDS-UPDRS_Part_IV__Motor_Complications.csv
- Non-motor Assessments: ALL
- All downloaded, though not all used or up-to-date
Benton_Judgement_of_Line_Orientation.csvClock_Drawing.csvCognitive_Categorization.csvCognitive_Change.csvEpworth_Sleepiness_Scale.csvGeriatric_Depression_Scale__Short_Version_.csvHopkins_Verbal_Learning_Test_-_Revised.csvLetter_-_Number_Sequencing.csvLexical_Fluency.csvModified_Boston_Naming_Test.csvModified_Semantic_Fluency.csvMontreal_Cognitive_Assessment__MoCA_.csvNeuro_QoL__Cognition_Function_-_Short_Form.csvNeuro_QoL__Communication_-_Short_Form.csvQUIP-Current-Short.csvREM_Sleep_Behavior_Disorder_Questionnaire.csvSCOPA-AUT.csvState-Trait_Anxiety_Inventory.csvSymbol_Digit_Modalities_Test.csvTrail_Making_A_and_B.csvUniversity_of_Pennsylvania_Smell_Identification_Test_UPSIT.csv
Requires idaSearch.csv as well as demographics/assessments files listed above.
Run:
./code/scripts/curation/generate_manifest.py --dataset . --regenerateThis creates the manifest file and the DICOM directory mapping.
Requires idaSearch.csv.
Run:
./code/scripts/curation/filter_image_descriptions.py --overwriteThis will update two files (which are both tracked by Git):
code/imaging_descriptions/ppmi_imaging_descriptions.json: mapping of BIDS datatype (and suffix) to protocol namescode/imaging_descriptions/ppmi_imaging_ignored.csv: protocol names that will not be included in the dataset
These two files should be checked to ensure that any modification makes sense. If not, see log messages from filter_image_descriptions.py for instructions on how to update the filters in code/nipoppy_ppmi/imaging_filters.py.
Requires:
idaSearch.csvcode/imaging_descriptions/ppmi_imaging_descriptions.jsongenerated by previous step- Updated
<NIPOPPY_ROOT>/sourcedata/imaging/doughnut.tsv
Run:
# regenerate the doughnut file if needed
nipoppy doughnut --dataset . --regenerate
# change --session-id and --chunk-size as needed
./code/scripts/dicom_reorg/fetch_dicom_downloads.py --dataset . --session-id BL --chunk-size 1000Follow instructions in log of previous step for creating a collection.
If needed, use utility script code/scripts/dicom_reorg/download_from_loni.sh to download directly to Compute Canada from the computer that initiated the LONI download (i.e. laptop)
- Need access to Compute Canada robot node for non-2FA SSH connections, see here.
- After initiating a download from LONI, download the CSV file with the download URLs and pass this to
download_from_loni.sh
Example command:
# this script should be copied to and run from local computer
# make sure to update the session ID as appropriate
./download_from_loni.sh <PATH_TO_CSV_FILE> <USERNAME>@robot.rorqual.alliancecan.ca <NIPOPPY_ROOT>/sourcedata/imaging/downloads/ses-BLAfter the downloads are complete, rename the files to something like 240924_ses1_list12_dataset-{INDEX}.zip if needed (will make unzipping easier).
Can use utility job script code/scripts/dicom_reorg/unzip_loni_downloads.sh. Example job submission command:
sbatch --array=1-10 --account=rrg-jbpoline --output=<NIPOPPY_ROOT>/logs/hpc/unzip_loni_downloads_%A_%a.out <NIPOPPY_ROOT>/code/scrip
ts/dicom_reorg/unzip_loni_downloads.sh <NIPOPPY_ROOT>/sourcedata/imaging/downloads/ses-V06/250401-sesV06-list4_dataset.zip <NIPOPPY_ROOT>/sourcedata/imaging/pre_reorg/ses-V06Move subdirectories out of parent PPMI directory and delete PPMI directory.
Then regenerate the doughnut file:
nipoppy doughnut --dataset . --regenerateUsing custom script. This step is quite slow so need to do it in a Slurm job. Sample submission command:
sbatch --account=rrg-jbpoline --time=10:00:00 --mem=1G --job-name=ppmi_reorg --output="<NIPOPPY_ROOT>/logs/hpc/%x_%j.out" --wrap="<ACTIVATE_NIPOPPY_PPMI_ENVIRONMENT> && <NIPOPPY_ROOT>/scripts/dicom_reorg/dicom_reorg.py --dataset <NIPOPPY_ROOT>"Sample commands:
DATASET_ROOT=`realpath <NIPOPPY_ROOT>`; DPATH_SQUASH=$DATASET_ROOT/sourcedata/imaging/squash; FPATH_SQUASH=$DPATH_SQUASH/ses-BL/250401-sesBL-list1.squashfs; DPATH_LOGS=$DATASET_ROOT/logs/hpc; DPATH_CODE=$DATASET_ROOT/code/scripts/dicom_reorg; sbatch --account=rrg-jbpoline --output=$DPATH_LOGS/slurm-%j.out $DPATH_CODE/make_squash.sh --exclude $DPATH_CODE/exclude.txt --move /ppmi $FPATH_SQUASH $DATASET_ROOT/dicom $DATASET_ROOT/scratchSample command:
nipoppy bidsify --dataset <NIPOPPY_ROOT> --pipeline heudiconv --pipeline-step convert --session-id BL --write-list <NIPOPPY_ROOT>/code/slurm/to_run/tmp.tsvModify bids_conversion.sh appropriately (IMPORTANT: set --pipeline-step to "prepare")
sbatch <NIPOPPY_ROOT>/code/slurm/bids_conversion.shcd <NIPOPPY_ROOT>/code/nipoppy_ppmi/
rm -rf fake_bids/
/<NIPOPPY_ROOT>/code/nipoppy_ppmi/heuristic.py --dataset <NIPOPPY_ROOT> --session-id BLMake sure the above script does not produce errors.
Rename .heudiconv directory in <NIPOPPY_ROOT>/bids
IMPORTANT: set --pipeline-step to "convert"
sbatch <NIPOPPY_ROOT>/code/slurm/bids_conversion.sh- rename
- tar (inside
bids/directory:tar -czvf .heudiconv-<EXTRA>.tar.gz .heudiconv-<EXTRA>/) - delete (IMPORTANT: only do this if SquashFS creation is done!)
nipoppy doughnut <NIPOPPY_ROOT> --regenerate./code/scripts/curation/add_bval_bvec_to_B0_dwi.py --dataset <NIPOPPY_ROOT> --session-id BL- Some search fields in LONI search tool cannot be trusted
- Examples:
ModalityModality=DTIcan have anatomical images, and there are diffusion images withMRImodality
Weighting(underImaging Protocol)- Some T1s have
Weighting=PD
- Some T1s have
- We classify image modalities/contrast only based on the
Image Descriptioncolumn- This can also lead to issues, for example when a subject has the same description string for all of their scans. In that case, we manually determine the image modality/contrast and hard-code the mapping in
heuristic.pyfor HeuDiConv
- This can also lead to issues, for example when a subject has the same description string for all of their scans. In that case, we manually determine the image modality/contrast and hard-code the mapping in
- Examples:
- LONI viewer sometimes shows seemingly bad/corrupted files but they are actually fine once we convert them
- Observed for some diffusion images (tend to have ~2700 slices according to the LONI image viewer)
- Some subjects have a huge amount of small DICOM files, which causes us to exceed the inode quota on
/scratch- We opted to create SquashFS archives/filesystems, which count as 1 inode and can be mounted as a filesystem in Singularity container (using the
--overlayargument). This is similar to how McGill/NeuroHub stores UK Biobank data on Compute Canada
- We opted to create SquashFS archives/filesystems, which count as 1 inode and can be mounted as a filesystem in Singularity container (using the
The tabular/ppmi_imaging_descriptions.json file is used to determine the BIDS datatype and suffix (contrast) associated with an image's MRI series description. It will be updated as new data is processed.
Here is a description of the available BIDS data and the tags that can appear in their filenames:
anat- The available suffixes are:
T1w,T2w,T2starw, andFLAIR - Most images have an
acqtag:- Non-neuromelanin images:
acq-<plane><type>, where<plane>is one of:sag,ax, orcor(for sagittal, axial, or coronal scans respectively)<type>is one of:2D, or3D
- Neuromelanin images:
acq-NM
- Non-neuromelanin images:
- For some images, the acquisition plane (
sag/ax/cor) or type (2D/3D) cannot be easily obtained. In those cases, the filename will not contain anacqtag.
- The available suffixes are:
dwi- All imaging files have the
dwisuffix. - Most images have a
dirtag corresponding to the phase-encoding direction. This is one of:LR,RL,AP, orPA - Images where the phase-encoding direction cannot be easily inferred from the series description string do not have a
dirtag. - Some participants have multi-shell sequences for their diffusion data. These files will have an additional
acq-B<value>tag, wherevalueis the b-value for that sequence.
- All imaging files have the
Currently, only structural (anat) and diffusion (dwi) MRI data are supported. Functional (func) data has not been converted to the BIDS format yet.
AttributeError: 'Dataset' object has no attribute 'StackID'- Vincent previously had the same issue, unclear if/how it was fixed. Error could be because the images are in a single big DICOM instead of many small DICOM files
AssertionError: Conflicting study identifiers found- Could be because all of a subject's DICOMs are pooled together in the
dicom_orgstep, in which case this can be fixed by manually running HeuDiConv for each image
- Could be because all of a subject's DICOMs are pooled together in the
numpy.AxisError: axis 1 is out of bounds for array of dimension 1- Only happened for one image so far
- See nipy/heudiconv#670 and nipy/nibabel#1245
AssertionError (assert HEUDICONV_VERSION_JSON_KEY not in json_)- Thrown by HeuDiConv
AssertionError: we do expect some files since it was called (assert bids_files, "we do expect some files since it was called")- Thrown by HeuDiConv
- Some subjects only have a single diffusion image (e.g.,
Ax DTI), might not be usable - Some subjects have 2 diffusion images, but they have the same description string (e.g.,
DTI_gated)- Checked some cases after BIDS conversion, and the JSON sidecars seem to have the same
PhaseEncodingDirection(j-)
- Checked some cases after BIDS conversion, and the JSON sidecars seem to have the same
- Some subjects have multi-shell sequences. Their files seem to follow the following pattern:
dir-PA: 1B0, 1B700, 1B1000, and 1B2000imagedir-AP: 4B0images
- Some (~2 for
ses-BL) subjects havedir-APfor all their diffusion images- Seem to have 4
dir-APB0images and 4 otherdir-APimages (according to their description string)
- Seem to have 4
- Some diffusion images do not contain raw data, but rather tensor model results (
FA,ADC,TRACEW). Some of these have been excluded before BIDS conversion, but not all of them