cfmm2bids

A Snakemake workflow for converting CFMM DICOM data to BIDS format using heudiconv.

Features

Query DICOM studies from CFMM with flexible search specifications
Filter studies with include/exclude rules
Download DICOM studies from CFMM
Convert DICOM to BIDS format using heudiconv
Apply post-conversion fixes (remove files, update JSON metadata, fix NIfTI orientation)
Validate BIDS datasets
Generate quality control (QC) reports for each subject/session

Workflow Stages

The workflow is organized into 5 main processing stages plus a final copy stage, each producing intermediate outputs:

Note on BIDS staging: The convert and fix stages use a two-step assembly process:

Individual subject/session data is first written to bids-staging/sub-*/ses-*/ directories
All requested subjects are then assembled into a single bids/ directory This ensures the BIDS dataset is always clean and matches the requested subjects, making it easier to add/remove subjects without leftover files.

1. Query Stage (`results/0_query`)

Queries DICOM studies from CFMM using search specifications defined in config/config.yaml. Features include:

Multiple search specifications with different query parameters
Flexible metadata mapping (e.g., extract subject/session from PatientID, StudyDate)
Pattern matching with regex extraction
Automatic sanitization of subject/session IDs
Validation of subject/session ID format (alphanumeric only)
Query caching: Queries are cached based on a hash of the query parameters. If the studies.tsv file already exists and query parameters haven't changed, the query is skipped. This is especially useful when using remote executors like SLURM, where multiple jobs querying simultaneously can cause issues.
Use --config force_requery=true to force a fresh query when new scans may have been acquired

Output: studies.tsv - Complete list of matched studies

2. Filter Stage (`results/1_filter`)

Post-filters the queried studies based on include/exclude rules. Features include:

Include/exclude filters using pandas query syntax
Optional --config head=N to process only first N subjects for testing

Output: studies_filtered.tsv - Filtered list of studies to process

3. Download Stage (`results/2_download`)

Downloads DICOM studies from CFMM using cfmm2tar. When merge_duplicate_studies: true is enabled, multiple studies for the same subject/session are downloaded as separate tar files in the same directory.

Output: dicoms/sub-*/ses-*/ - Downloaded DICOM files (tar archives)

4. Convert Stage (`results/3_convert`)

Converts DICOMs to BIDS format using heudiconv and generates QC reports. Features include:

BIDS conversion with heudiconv and custom heuristic
Automatic handling of duplicate studies (when merge_duplicate_studies: true):
- Each tar file (study) is processed separately with heudiconv
- Outputs are automatically merged into a single session
- A study_uid column is added to dicominfo.tsv to track series origin
QC report generation (series list and unmapped summary)
BIDS validation with bids-validator-deno
Metadata preservation (auto.txt, dicominfo.tsv)

Outputs:

bids-staging/sub-*/ses-*/ - Intermediate BIDS-formatted data per subject/session
bids/ - Assembled BIDS dataset (all subjects combined)
qc/sub-*/ses-*/ - Heudiconv metadata and QC reports (auto.txt, dicominfo.tsv, series.svg, unmapped.svg)
qc/bids_validator.json - BIDS validation results
qc/aggregate_report.html - Aggregate QC report for all sessions

5. Fix Stage (`results/4_fix`)

Applies post-conversion fixes to the BIDS dataset. Available fix actions:

remove: Remove files matching a pattern (e.g., unwanted fieldmaps)
update_json: Update JSON sidecar metadata (e.g., add PhaseEncodingDirection)
fix_orientation: Reorient NIfTI files to canonical RAS+ orientation

Outputs:

bids-staging/sub-*/ses-*/ - Intermediate fixed BIDS data per subject/session
bids/ - Assembled fixed BIDS dataset (all subjects combined)
qc/sub-*/ses-*/sub-*_ses-*_provenance.json - Fix provenance tracking
qc/bids_validator.json - Post-fix BIDS validation results
qc/aggregate_report.html - Aggregate QC report including fix provenance

6. Final Stage (`bids/`)

Copies the validated and fixed BIDS dataset to the final output directory.

QC Reports

The workflow automatically generates QC reports for each subject/session after heudiconv conversion. The reports include:

Series List (*_series.svg): A detailed table showing each series with:
- Series ID and description
- Protocol name
- Image dimensions
- TR and TE values
- Corresponding BIDS filename (or "NOT MAPPED" if unmapped)
- For merged studies: includes study_uid to identify which study each series came from
Unmapped Summary (*_unmapped.svg): A summary of series that were not mapped to BIDS, helping identify potential missing data or heuristic issues

QC reports are saved in: results/3_convert/qc/sub-{subject}/ses-{session}/

Note: The QC report generation is integrated into the Snakemake workflow as a script directive and cannot be run manually as a standalone CLI tool.

Aggregate HTML Report

After the fix stage, an aggregate HTML report is automatically generated that consolidates QC information from all subjects and sessions. The report includes:

Overview Statistics: Total subjects, sessions, series, and unmapped series count
BIDS Validation Results: Validation results from both convert and fix stages
Aggregated Series Table: All series data sorted by subject and session
Post-Conversion Fix Provenance: Details of fixes applied to each session
Heudiconv Filegroup Metadata: Detailed metadata from heudiconv conversion (collapsible)

The aggregate report is located at: results/4_fix/qc/aggregate_report.html

This report provides a comprehensive overview of the entire dataset conversion process and is useful for quality control and troubleshooting.

Configuration

The workflow is configured via config/config.yaml. Key configuration sections include:

For working examples, see:

config/config_trident15T.yml - Configuration for Trident 15T scanner
config/config_cogms.yml - Configuration for CogMS study

Query Configuration (`search_specs`)

Define one or more DICOM queries with metadata mappings:

search_specs:
  - dicom_query:
      study_description: YourStudy^*
      study_date: 20230101-
    metadata_mappings:
      subject:
        source: PatientID        # Extract from PatientID field
        pattern: '_([^_]+)$'    # Regex to extract subject ID
        sanitize: true          # Remove non-alphanumeric characters
      session:
        source: StudyDate       # Use StudyDate as session ID

Filter Configuration (`study_filter_specs`)

Post-filter studies with include/exclude rules:

study_filter_specs:
  include:
    - "subject.str.startswith('sub')"  # Include only subjects starting with 'sub'
  exclude:
    - "StudyInstanceUID == '1.2.3.4.5'"  # Exclude specific study

Download Configuration

cfmm2tar_download_options: Options passed to cfmm2tar (e.g., --skip-derived)
credentials_file: Path to CFMM credentials file
merge_duplicate_studies: If true, automatically merge multiple studies for the same subject/session (default: false)

Merging Duplicate Studies

When merge_duplicate_studies: true is enabled and multiple studies match the same subject/session:

All study tar files are downloaded to the same directory
Each study's DICOM tar file is processed separately with heudiconv
Outputs from each study are automatically merged into a single session:
- BIDS NIfTI and JSON files from all studies are combined
- The auto.txt files are merged (all series info concatenated)
- The dicominfo.tsv files are merged with a study_uid column added to track which series came from which study
This is useful when subjects have multiple scan sessions on the same day (e.g., due to console reboot or scanner issues)
If disabled and duplicates are found, the workflow will fail with an error message

Convert Configuration

heuristic: Path to heudiconv heuristic file
dcmconfig_json: Path to dcm2niix configuration
heudiconv_options: Additional heudiconv options

Available Heuristics

The workflow includes several heuristic files for different scanner configurations:

heuristics/cfmm_base.py: Base CFMM heuristic supporting standard sequences including:
- MP2RAGE, MEMP2RAGE, Sa2RAGE
- T2 TSE (Turbo Spin Echo)
- T2 SPACE, T2 FLAIR
- Multi-echo GRE
- TOF Angiography
- Diffusion-weighted imaging
- BOLD fMRI (multiband, psf-dico)
- Field mapping (EPI-PA, GRE)
- DIS2D/DIS3D distortion-corrected reconstructions - Improved detection that robustly identifies distortion-corrected images regardless of their position in the DICOM image_type metadata
heuristics/trident_15T.py: Trident 15T scanner-specific heuristic
heuristics/Menon_CogMSv2.py: CogMS study-specific heuristic

The heuristics automatically detect and label distortion-corrected (DIS2D/DIS3D) reconstructions using the rec-DIS2D or rec-DIS3D BIDS suffix.

Fix Configuration (`post_convert_fixes`)

Define fixes to apply after conversion:

post_convert_fixes:
  - name: remove_fieldmaps
    pattern: "fmap/*dir-AP*"
    action: remove
    
  - name: add_phase_encoding
    pattern: "func/*bold.json"
    action: update_json
    updates:
      PhaseEncodingDirection: "j-"
      
  - name: reorient_nifti
    pattern: "anat/*T1w.nii.gz"
    action: fix_orientation

Other Options

final_bids_dir: Final output directory (default: bids)
stages: Customize intermediate stage directories

Usage

Install pixi

curl -fsSL https://pixi.sh/install.sh | sh

Clone the cfmm2bids repository

git clone https://github.com/akhanf/cfmm2bids
cd cfmm2bids

Install dependencies into pixi virtual environment
```
pixi install
```
Configure your search specifications by editing the config/config.yaml

Note: Example configurations are available:
- config/config_trident15T.yml - Trident 15T scanner setup
- config/config_cogms.yml - CogMS study setup
You can use these as starting points or use the template in config/config.yaml.

To use one of the example configs directly:
```
pixi run snakemake --configfile config/config_trident15T.yml --dry-run
```
Run the workflow as a dry-run to see what will be executed:
```
pixi run snakemake --dry-run
```
Run specific workflow stages or the full workflow:

Run all stages (query → filter → download → convert → fix → final):
```
pixi run snakemake --cores all
```
Run only download stage:
```
pixi run snakemake download --cores all
```
Run only convert stage (includes QC reports):
```
pixi run snakemake convert --cores all
```
Run only fix stage:
```
pixi run snakemake fix --cores all
```
Process only the first subject (useful for testing):
```
pixi run snakemake head --cores all
```
Process only first N subjects (e.g., first 3):
```
pixi run snakemake --config head=3 --cores all
```

Run the workflow on a SLURM cluster:

pixi run snakemake --executor slurm --jobs 10

Output Directory Structure

Final BIDS Output

bids/                           # Final BIDS-formatted output
├── dataset_description.json    # BIDS dataset metadata
└── sub-*/
    └── ses-*/                  # Subject/session data (anat/, func/, fmap/, etc.)

Intermediate Stages

results/
├── 0_query/
│   └── studies.tsv                              # All queried studies
├── 1_filter/
│   └── studies_filtered.tsv                     # Filtered studies to process
├── 2_download/
│   └── dicoms/
│       └── sub-*/ses-*/                         # Downloaded DICOM tar files
├── 3_convert/
│   ├── bids-staging/
│   │   ├── dataset_description.json            # BIDS dataset metadata
│   │   ├── .bidsignore                         # BIDS ignore file
│   │   └── sub-*/ses-*/                        # Per-subject/session BIDS data (intermediate)
│   ├── bids/                                   # Assembled BIDS dataset (all subjects)
│   └── qc/
│       ├── sub-*/ses-*/                        # Per-subject/session QC and metadata
│       │   ├── sub-*_ses-*_auto.txt           # Heudiconv auto conversion info
│       │   ├── sub-*_ses-*_dicominfo.tsv      # Heudiconv DICOM metadata table
│       │   ├── sub-*_ses-*_series.tsv         # Series info table
│       │   ├── sub-*_ses-*_series.svg         # Series QC visualization
│       │   ├── sub-*_ses-*_unmapped.svg       # Unmapped series visualization
│       │   └── sub-*_ses-*_report.html        # Individual subject/session report
│       ├── bids_validator.json                # BIDS validation results
│       └── aggregate_report.html              # Aggregate QC report
└── 4_fix/
    ├── bids-staging/
    │   ├── dataset_description.json            # BIDS dataset metadata
    │   ├── .bidsignore                         # BIDS ignore file
    │   └── sub-*/ses-*/                        # Per-subject/session fixed BIDS data (intermediate)
    ├── bids/                                   # Assembled fixed BIDS dataset (all subjects)
    └── qc/
        ├── sub-*/ses-*/                        # Per-subject/session provenance
        │   ├── sub-*_ses-*_provenance.json    # Fix provenance tracking
        │   └── sub-*_ses-*_report.html        # Individual subject/session report with fixes
        ├── bids_validator.json                # Post-fix validation results
        ├── final_bids_validator.txt           # Final validation (must pass)
        └── aggregate_report.html              # Aggregate QC report with fix provenance

Repository Directory Structure

├── workflow/                   # Workflow files
│   ├── Snakefile              # Main Snakemake workflow
│   ├── lib/                   # Python modules
│   │   ├── query_filter.py   # DICOM query and filtering functions
│   │   ├── bids_fixes.py     # Post-conversion fix implementations
│   │   ├── convert.py        # Heudiconv conversion helpers (single/multi-study)
│   │   └── utils.py          # Utility functions
│   └── scripts/               # Workflow scripts
│       ├── run_heudiconv.py                   # Run heudiconv (handles single/multi-study)
│       ├── generate_convert_qc_figs.py       # QC report generation
│       ├── generate_subject_report.py        # Individual subject/session reports
│       ├── generate_aggregate_all_report.py  # Aggregate QC report
│       └── post_convert_fix.py               # Post-conversion fix application
├── heuristics/                 # Heudiconv heuristic files
│   ├── cfmm_base.py           # Base CFMM heuristic (supports DIS2D/DIS3D reconstruction)
│   ├── trident_15T.py         # Trident 15T scanner-specific heuristic
│   └── Menon_CogMSv2.py       # CogMS study-specific heuristic
├── resources/                  # Resource files
│   ├── dcm2niix_config.json   # dcm2niix configuration
│   └── dataset_description.json  # BIDS dataset metadata template
├── heuristics/                 # Heudiconv heuristics
│   ├── cfmm_base.py           # Base heuristic for CFMM data
│   ├── trident_15T.py         # Example: Trident 15T scanner heuristic
│   └── Menon_CogMSv2.py       # Example: CogMS study heuristic
├── config/                     # Configuration files
│   ├── config.yml             # Configuration template (customize this)
│   ├── config_trident15T.yml  # Example: Trident 15T scanner configuration
│   └── config_cogms.yml       # Example: CogMS study configuration
└── pixi.toml                  # Pixi project configuration and dependencies

Available Target Rules

The workflow provides several target rules for running specific stages:

all (default): Run all stages from query to final BIDS output
head: Process only the first subject (useful for testing)
download: Run only query, filter, and download stages
convert: Run through convert stage (includes download, conversion, and QC)
fix: Run through fix stage (includes all above plus post-conversion fixes)

Example usage:

# Test with first subject only
pixi run snakemake head --cores all

# Download all DICOMs without conversion
pixi run snakemake download --cores all

# Convert and generate QC reports
pixi run snakemake convert --cores all

Name		Name	Last commit message	Last commit date
Latest commit History 216 Commits
.github		.github
config		config
heuristics		heuristics
resources		resources
workflow		workflow
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
README.md		README.md
pixi.lock		pixi.lock
pixi.toml		pixi.toml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

cfmm2bids

Features

Workflow Stages

1. Query Stage (`results/0_query`)

2. Filter Stage (`results/1_filter`)

3. Download Stage (`results/2_download`)

4. Convert Stage (`results/3_convert`)

5. Fix Stage (`results/4_fix`)

6. Final Stage (`bids/`)

QC Reports

Aggregate HTML Report

Configuration

Query Configuration (`search_specs`)

Filter Configuration (`study_filter_specs`)

Download Configuration

Merging Duplicate Studies

Convert Configuration

Available Heuristics

Fix Configuration (`post_convert_fixes`)

Other Options

Usage

Output Directory Structure

Final BIDS Output

Intermediate Stages

Repository Directory Structure

Available Target Rules

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

khanlab/cfmm2bids

Folders and files

Latest commit

History

Repository files navigation

cfmm2bids

Features

Workflow Stages

1. Query Stage (results/0_query)

2. Filter Stage (results/1_filter)

3. Download Stage (results/2_download)

4. Convert Stage (results/3_convert)

5. Fix Stage (results/4_fix)

6. Final Stage (bids/)

QC Reports

Aggregate HTML Report

Configuration

Query Configuration (search_specs)

Filter Configuration (study_filter_specs)

Download Configuration

Merging Duplicate Studies

Convert Configuration

Available Heuristics

Fix Configuration (post_convert_fixes)

Other Options

Usage

Output Directory Structure

Final BIDS Output

Intermediate Stages

Repository Directory Structure

Available Target Rules

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

1. Query Stage (`results/0_query`)

2. Filter Stage (`results/1_filter`)

3. Download Stage (`results/2_download`)

4. Convert Stage (`results/3_convert`)

5. Fix Stage (`results/4_fix`)

6. Final Stage (`bids/`)

Query Configuration (`search_specs`)

Filter Configuration (`study_filter_specs`)

Fix Configuration (`post_convert_fixes`)

Packages