A Snakemake workflow for converting CFMM DICOM data to BIDS format using heudiconv.
- Query DICOM studies from CFMM with flexible search specifications
- Filter studies with include/exclude rules
- Download DICOM studies from CFMM
- Convert DICOM to BIDS format using heudiconv
- Apply post-conversion fixes (remove files, update JSON metadata, fix NIfTI orientation)
- Validate BIDS datasets
- Generate quality control (QC) reports for each subject/session
The workflow is organized into 5 main processing stages plus a final copy stage, each producing intermediate outputs:
Note on BIDS staging: The convert and fix stages use a two-step assembly process:
- Individual subject/session data is first written to
bids-staging/sub-*/ses-*/directories - All requested subjects are then assembled into a single
bids/directory This ensures the BIDS dataset is always clean and matches the requested subjects, making it easier to add/remove subjects without leftover files.
Queries DICOM studies from CFMM using search specifications defined in config/config.yaml. Features include:
- Multiple search specifications with different query parameters
- Flexible metadata mapping (e.g., extract subject/session from PatientID, StudyDate)
- Pattern matching with regex extraction
- Automatic sanitization of subject/session IDs
- Validation of subject/session ID format (alphanumeric only)
- Query caching: Queries are cached based on a hash of the query parameters. If the
studies.tsvfile already exists and query parameters haven't changed, the query is skipped. This is especially useful when using remote executors like SLURM, where multiple jobs querying simultaneously can cause issues. - Use
--config force_requery=trueto force a fresh query when new scans may have been acquired
Output: studies.tsv - Complete list of matched studies
Post-filters the queried studies based on include/exclude rules. Features include:
- Include/exclude filters using pandas query syntax
- Optional
--config head=Nto process only first N subjects for testing
Output: studies_filtered.tsv - Filtered list of studies to process
Downloads DICOM studies from CFMM using cfmm2tar. When merge_duplicate_studies: true is enabled, multiple studies for the same subject/session are downloaded as separate tar files in the same directory.
Output: dicoms/sub-*/ses-*/ - Downloaded DICOM files (tar archives)
Converts DICOMs to BIDS format using heudiconv and generates QC reports. Features include:
- BIDS conversion with heudiconv and custom heuristic
- Automatic handling of duplicate studies (when
merge_duplicate_studies: true):- Each tar file (study) is processed separately with heudiconv
- Outputs are automatically merged into a single session
- A
study_uidcolumn is added to dicominfo.tsv to track series origin
- QC report generation (series list and unmapped summary)
- BIDS validation with
bids-validator-deno - Metadata preservation (auto.txt, dicominfo.tsv)
Outputs:
bids-staging/sub-*/ses-*/- Intermediate BIDS-formatted data per subject/sessionbids/- Assembled BIDS dataset (all subjects combined)qc/sub-*/ses-*/- Heudiconv metadata and QC reports (auto.txt, dicominfo.tsv, series.svg, unmapped.svg)qc/bids_validator.json- BIDS validation resultsqc/aggregate_report.html- Aggregate QC report for all sessions
Applies post-conversion fixes to the BIDS dataset. Available fix actions:
- remove: Remove files matching a pattern (e.g., unwanted fieldmaps)
- update_json: Update JSON sidecar metadata (e.g., add PhaseEncodingDirection)
- fix_orientation: Reorient NIfTI files to canonical RAS+ orientation
Outputs:
bids-staging/sub-*/ses-*/- Intermediate fixed BIDS data per subject/sessionbids/- Assembled fixed BIDS dataset (all subjects combined)qc/sub-*/ses-*/sub-*_ses-*_provenance.json- Fix provenance trackingqc/bids_validator.json- Post-fix BIDS validation resultsqc/aggregate_report.html- Aggregate QC report including fix provenance
Copies the validated and fixed BIDS dataset to the final output directory.
The workflow automatically generates QC reports for each subject/session after heudiconv conversion. The reports include:
-
Series List (
*_series.svg): A detailed table showing each series with:- Series ID and description
- Protocol name
- Image dimensions
- TR and TE values
- Corresponding BIDS filename (or "NOT MAPPED" if unmapped)
- For merged studies: includes
study_uidto identify which study each series came from
-
Unmapped Summary (
*_unmapped.svg): A summary of series that were not mapped to BIDS, helping identify potential missing data or heuristic issues
QC reports are saved in: results/3_convert/qc/sub-{subject}/ses-{session}/
Note: The QC report generation is integrated into the Snakemake workflow as a script directive and cannot be run manually as a standalone CLI tool.
After the fix stage, an aggregate HTML report is automatically generated that consolidates QC information from all subjects and sessions. The report includes:
- Overview Statistics: Total subjects, sessions, series, and unmapped series count
- BIDS Validation Results: Validation results from both convert and fix stages
- Aggregated Series Table: All series data sorted by subject and session
- Post-Conversion Fix Provenance: Details of fixes applied to each session
- Heudiconv Filegroup Metadata: Detailed metadata from heudiconv conversion (collapsible)
The aggregate report is located at: results/4_fix/qc/aggregate_report.html
This report provides a comprehensive overview of the entire dataset conversion process and is useful for quality control and troubleshooting.
The workflow is configured via config/config.yaml. Key configuration sections include:
For working examples, see:
config/config_trident15T.yml- Configuration for Trident 15T scannerconfig/config_cogms.yml- Configuration for CogMS study
Define one or more DICOM queries with metadata mappings:
search_specs:
- dicom_query:
study_description: YourStudy^*
study_date: 20230101-
metadata_mappings:
subject:
source: PatientID # Extract from PatientID field
pattern: '_([^_]+)$' # Regex to extract subject ID
sanitize: true # Remove non-alphanumeric characters
session:
source: StudyDate # Use StudyDate as session IDPost-filter studies with include/exclude rules:
study_filter_specs:
include:
- "subject.str.startswith('sub')" # Include only subjects starting with 'sub'
exclude:
- "StudyInstanceUID == '1.2.3.4.5'" # Exclude specific studycfmm2tar_download_options: Options passed to cfmm2tar (e.g.,--skip-derived)credentials_file: Path to CFMM credentials filemerge_duplicate_studies: Iftrue, automatically merge multiple studies for the same subject/session (default:false)
When merge_duplicate_studies: true is enabled and multiple studies match the same subject/session:
- All study tar files are downloaded to the same directory
- Each study's DICOM tar file is processed separately with heudiconv
- Outputs from each study are automatically merged into a single session:
- BIDS NIfTI and JSON files from all studies are combined
- The
auto.txtfiles are merged (all series info concatenated) - The
dicominfo.tsvfiles are merged with astudy_uidcolumn added to track which series came from which study
- This is useful when subjects have multiple scan sessions on the same day (e.g., due to console reboot or scanner issues)
- If disabled and duplicates are found, the workflow will fail with an error message
heuristic: Path to heudiconv heuristic filedcmconfig_json: Path to dcm2niix configurationheudiconv_options: Additional heudiconv options
The workflow includes several heuristic files for different scanner configurations:
-
heuristics/cfmm_base.py: Base CFMM heuristic supporting standard sequences including:- MP2RAGE, MEMP2RAGE, Sa2RAGE
- T2 TSE (Turbo Spin Echo)
- T2 SPACE, T2 FLAIR
- Multi-echo GRE
- TOF Angiography
- Diffusion-weighted imaging
- BOLD fMRI (multiband, psf-dico)
- Field mapping (EPI-PA, GRE)
- DIS2D/DIS3D distortion-corrected reconstructions - Improved detection that robustly identifies distortion-corrected images regardless of their position in the DICOM image_type metadata
-
heuristics/trident_15T.py: Trident 15T scanner-specific heuristic -
heuristics/Menon_CogMSv2.py: CogMS study-specific heuristic
The heuristics automatically detect and label distortion-corrected (DIS2D/DIS3D) reconstructions using the rec-DIS2D or rec-DIS3D BIDS suffix.
Define fixes to apply after conversion:
post_convert_fixes:
- name: remove_fieldmaps
pattern: "fmap/*dir-AP*"
action: remove
- name: add_phase_encoding
pattern: "func/*bold.json"
action: update_json
updates:
PhaseEncodingDirection: "j-"
- name: reorient_nifti
pattern: "anat/*T1w.nii.gz"
action: fix_orientationfinal_bids_dir: Final output directory (default:bids)stages: Customize intermediate stage directories
-
Install pixi
curl -fsSL https://pixi.sh/install.sh | sh -
Clone the cfmm2bids repository
git clone https://github.com/akhanf/cfmm2bids cd cfmm2bids -
Install dependencies into pixi virtual environment
pixi install
-
Configure your search specifications by editing the
config/config.yamlNote: Example configurations are available:
config/config_trident15T.yml- Trident 15T scanner setupconfig/config_cogms.yml- CogMS study setup
You can use these as starting points or use the template in
config/config.yaml.To use one of the example configs directly:
pixi run snakemake --configfile config/config_trident15T.yml --dry-run
-
Run the workflow as a dry-run to see what will be executed:
pixi run snakemake --dry-run
-
Run specific workflow stages or the full workflow:
Run all stages (query → filter → download → convert → fix → final):
pixi run snakemake --cores all
Run only download stage:
pixi run snakemake download --cores all
Run only convert stage (includes QC reports):
pixi run snakemake convert --cores all
Run only fix stage:
pixi run snakemake fix --cores all
Process only the first subject (useful for testing):
pixi run snakemake head --cores all
Process only first N subjects (e.g., first 3):
pixi run snakemake --config head=3 --cores all
-
Run the workflow on a SLURM cluster:
pixi run snakemake --executor slurm --jobs 10
bids/ # Final BIDS-formatted output
├── dataset_description.json # BIDS dataset metadata
└── sub-*/
└── ses-*/ # Subject/session data (anat/, func/, fmap/, etc.)
results/
├── 0_query/
│ └── studies.tsv # All queried studies
├── 1_filter/
│ └── studies_filtered.tsv # Filtered studies to process
├── 2_download/
│ └── dicoms/
│ └── sub-*/ses-*/ # Downloaded DICOM tar files
├── 3_convert/
│ ├── bids-staging/
│ │ ├── dataset_description.json # BIDS dataset metadata
│ │ ├── .bidsignore # BIDS ignore file
│ │ └── sub-*/ses-*/ # Per-subject/session BIDS data (intermediate)
│ ├── bids/ # Assembled BIDS dataset (all subjects)
│ └── qc/
│ ├── sub-*/ses-*/ # Per-subject/session QC and metadata
│ │ ├── sub-*_ses-*_auto.txt # Heudiconv auto conversion info
│ │ ├── sub-*_ses-*_dicominfo.tsv # Heudiconv DICOM metadata table
│ │ ├── sub-*_ses-*_series.tsv # Series info table
│ │ ├── sub-*_ses-*_series.svg # Series QC visualization
│ │ ├── sub-*_ses-*_unmapped.svg # Unmapped series visualization
│ │ └── sub-*_ses-*_report.html # Individual subject/session report
│ ├── bids_validator.json # BIDS validation results
│ └── aggregate_report.html # Aggregate QC report
└── 4_fix/
├── bids-staging/
│ ├── dataset_description.json # BIDS dataset metadata
│ ├── .bidsignore # BIDS ignore file
│ └── sub-*/ses-*/ # Per-subject/session fixed BIDS data (intermediate)
├── bids/ # Assembled fixed BIDS dataset (all subjects)
└── qc/
├── sub-*/ses-*/ # Per-subject/session provenance
│ ├── sub-*_ses-*_provenance.json # Fix provenance tracking
│ └── sub-*_ses-*_report.html # Individual subject/session report with fixes
├── bids_validator.json # Post-fix validation results
├── final_bids_validator.txt # Final validation (must pass)
└── aggregate_report.html # Aggregate QC report with fix provenance
├── workflow/ # Workflow files
│ ├── Snakefile # Main Snakemake workflow
│ ├── lib/ # Python modules
│ │ ├── query_filter.py # DICOM query and filtering functions
│ │ ├── bids_fixes.py # Post-conversion fix implementations
│ │ ├── convert.py # Heudiconv conversion helpers (single/multi-study)
│ │ └── utils.py # Utility functions
│ └── scripts/ # Workflow scripts
│ ├── run_heudiconv.py # Run heudiconv (handles single/multi-study)
│ ├── generate_convert_qc_figs.py # QC report generation
│ ├── generate_subject_report.py # Individual subject/session reports
│ ├── generate_aggregate_all_report.py # Aggregate QC report
│ └── post_convert_fix.py # Post-conversion fix application
├── heuristics/ # Heudiconv heuristic files
│ ├── cfmm_base.py # Base CFMM heuristic (supports DIS2D/DIS3D reconstruction)
│ ├── trident_15T.py # Trident 15T scanner-specific heuristic
│ └── Menon_CogMSv2.py # CogMS study-specific heuristic
├── resources/ # Resource files
│ ├── dcm2niix_config.json # dcm2niix configuration
│ └── dataset_description.json # BIDS dataset metadata template
├── heuristics/ # Heudiconv heuristics
│ ├── cfmm_base.py # Base heuristic for CFMM data
│ ├── trident_15T.py # Example: Trident 15T scanner heuristic
│ └── Menon_CogMSv2.py # Example: CogMS study heuristic
├── config/ # Configuration files
│ ├── config.yml # Configuration template (customize this)
│ ├── config_trident15T.yml # Example: Trident 15T scanner configuration
│ └── config_cogms.yml # Example: CogMS study configuration
└── pixi.toml # Pixi project configuration and dependencies
The workflow provides several target rules for running specific stages:
all(default): Run all stages from query to final BIDS outputhead: Process only the first subject (useful for testing)download: Run only query, filter, and download stagesconvert: Run through convert stage (includes download, conversion, and QC)fix: Run through fix stage (includes all above plus post-conversion fixes)
Example usage:
# Test with first subject only
pixi run snakemake head --cores all
# Download all DICOMs without conversion
pixi run snakemake download --cores all
# Convert and generate QC reports
pixi run snakemake convert --cores all