Skip to content

Commit bdf0ab9

Browse files
authored
Move and restructure CPAC comparison notes (#341)
Renames tests/data/cpac_rbc_divergences.md to docs/cpac_comparison.md and splits findings into Bugs (#4, #5) and Divergences (#1, #2, #3, #6) to distinguish unambiguous numerical errors from defensible design choices. Adds a new divergence about CPAC's duplicate motion correction.
1 parent 253c9a8 commit bdf0ab9

1 file changed

Lines changed: 47 additions & 37 deletions

File tree

Lines changed: 47 additions & 37 deletions
Original file line numberDiff line numberDiff line change
@@ -1,79 +1,89 @@
1-
# C-PAC RBC Pipeline: Bugs and Divergences
1+
# C-PAC RBC Pipeline: Implementation Notes
22

33
Findings from cross-referencing C-PAC source code and actual RBC pipeline run logs.
44

5+
These notes document differences observed between RBC's implementation and the C-PAC reference. Items under **Bugs** are numerical errors or spec violations that affect output correctness regardless of intent. Items under **Divergences** are differences that may reflect intentional design choices in C-PAC (final classification is the RBC team's interpretation based on code reading).
6+
57
**Sources:**
68
- C-PAC code: `CPAC/anat_preproc/anat_preproc.py`, `CPAC/anat_preproc/ants.py`, `CPAC/pipeline/cpac_pipeline.py`, `CPAC/func_preproc/func_preproc.py`, `CPAC/alff/alff.py`, `CPAC/qc/xcp.py`
79
- C-PAC RBC run logs (ds000001, sub-01_ses-1)
810

911
---
1012

11-
## 1. N4 bias correction output discarded
13+
## Bugs
1214

13-
ANTs brain extraction (`antsBrainExtraction.sh`) runs N4 bias field correction internally (both an initial `inu_n4` and a refined `inu_n4_final`), producing a bias-corrected T1w. However, the downstream `brain_extraction` node (`anat_preproc.py:1925-1936`) applies the brain mask to the *original* (pre-N4) T1w via `3dcalc`, discarding the corrected output entirely.
15+
### 4. Bandpass filtering uses wrong TR from resampled NIfTI header
1416

15-
From the logs:
17+
C-PAC's `frequency_filter` node calls `bandpass_voxels` (in `CPAC/nuisance/bandpass.py`) which reads the TR from the NIfTI header of the template-space residuals image. After ANTs resampling to template space, the pixdim[4] (TR) in the NIfTI header is set to 1.0 instead of the original repetition time (2.0 for ds000001).
18+
19+
Verified from the working directory:
1620
```
17-
3dcalc -a .../anat_reorient_0/sub-01_T1w_resample.nii.gz
18-
-b .../anat_skullstrip_ants/atropos_wf/.../mask_xform.nii.gz
19-
-expr "a*step(b)"
21+
tests/data/cpac_outputs/ds000001/working/.../nuisance_regression_template_36-parameter_212/
22+
.../frequency_filter/_inputs.pklz
23+
-> realigned_file: .../residuals.nii.gz (header has TR=1.0)
24+
-> bandpass_freqs: [0.01, 0.1]
25+
-> sample_period: (not connected, reads from nifti)
2026
```
2127

22-
The `-a` input is the reoriented T1w, not the `inu_n4_final` output. The brain mask itself is valid (derived from N4-corrected data), but all downstream steps (segmentation, registration, etc.) operate on uncorrected intensities.
28+
The `ideal_bandpass` function uses `sample_period` to compute FFT frequency bin indices. With TR=1.0 instead of 2.0, the effective passband is 0.005-0.05 Hz (half the intended 0.01-0.1 Hz range). This affects both the BOLD bandpass filtering and the regressor bandpass filtering.
2329

24-
The standalone `n4_bias_field_correction` config option is also `Off` by default and not overridden by the RBC config, so no bias correction is applied at any stage.
30+
Confirmed by reproducing the filter: applying `ideal_bandpass` with TR=1.0 to C-PAC's raw regressors produces r=1.000 correlation with C-PAC's exported filtered regressors, while TR=2.0 (correct) produces r=0.758.
31+
32+
This bug affects all outputs downstream of the frequency filter: cleaned BOLD, ReHo, timeseries extraction, and connectivity matrices. ALFF/fALFF are also affected since they are computed from the bandpassed BOLD.
2533

2634
---
2735

28-
## 2. Despiking runs in template space, breaking preprocessing order
36+
### 5. DVFinal metrics computed from pre-regression BOLD (same as DVInit)
2937

30-
The RBC config sets `despike.space: template`. The logs confirm `3dDespike` ran on already-resampled template-space data, after single-step resampling and merging. Motion estimation and slice-timing correction both operate on non-despiked data, which negates the purpose of despiking (removing outlier timepoints that bias motion estimation).
38+
The XCP-style QC file reports `meanDVFinal` and `motionDVCorrFinal` as post-regression DVARS metrics (per the XCP dictionary: "correlation of RMS and DVARS after regression"). However, inspecting the C-PAC source code (`xcp.py`) reveals that both `DVInit` and `DVFinal` are computed from the same pre-nuisance regression BOLD input.
3139

32-
Despiking should also happen before resampling for signal processing reasons: in native space, each voxel's timeseries reflects a consistent tissue location, so outlier detection operates on clean per-voxel signals. After resampling, each voxel is an interpolated blend of neighboring native voxels, which smooths out the sharp spikes that despiking is designed to catch and mixes signals across tissue boundaries.
40+
This was confirmed empirically: `meanDVInit = meanDVFinal` and `motionDVCorrInit = motionDVCorrFinal` for every subject across all three datasets (HBN, PNC, NKI) during the small benchmarking study.
3341

34-
This also breaks the `calculate_motion_first` flag. The config inherits `calculate_motion_first: On` from `fmriprep-options`, which should estimate motion on raw (pre-STC) data. In `cpac_pipeline.py:1199-1210`, when this flag is true, the block order is `func_init + func_motion + func_preproc + [func_motion_correct_only] + ...`. But `func_preproc_blocks` includes both despike and STC, and since despike was deferred to template space, the block effectively only contained STC. The logs show motion estimation (`3dvolreg` for reference, `mcflirt` for correction) ran on post-STC data instead of raw data.
42+
In contrast, RBC correctly computes `DVFinal` from the post-regression bandpass-filtered BOLD (`cleaned_bold`), as intended.
3543

3644
---
3745

38-
## 3. ALFF uses temporal std instead of mean FFT amplitude
39-
40-
C-PAC computes ALFF as the temporal standard deviation of a bandpass-filtered signal (`3dBandpass` followed by `3dTstat -stdev`), and fALFF as the ratio of filtered to unfiltered standard deviation. This differs from the original Zang et al. 2007 definition, which uses the arithmetic mean of FFT amplitude coefficients within the frequency band. The two approaches are related (both measure low-frequency power) but produce numerically different maps.
41-
42-
---
46+
## Divergences
4347

44-
## 4. Bandpass filtering uses wrong TR from resampled NIfTI header
48+
### 1. N4 bias correction output discarded (may be unintentional)
4549

46-
C-PAC's `frequency_filter` node calls `bandpass_voxels` (in `CPAC/nuisance/bandpass.py`) which reads the TR from the NIfTI header of the template-space residuals image. After ANTs resampling to template space, the pixdim[4] (TR) in the NIfTI header is set to 1.0 instead of the original repetition time (2.0 for ds000001).
50+
ANTs brain extraction (`antsBrainExtraction.sh`) runs N4 bias field correction internally (both an initial `inu_n4` and a refined `inu_n4_final`), producing a bias-corrected T1w. However, the downstream `brain_extraction` node (`anat_preproc.py:1925-1936`) applies the brain mask to the *original* (pre-N4) T1w via `3dcalc`, discarding the corrected output entirely.
4751

48-
Verified from the working directory:
52+
From the logs:
4953
```
50-
tests/data/cpac_outputs/ds000001/working/.../nuisance_regression_template_36-parameter_212/
51-
.../frequency_filter/_inputs.pklz
52-
-> realigned_file: .../residuals.nii.gz (header has TR=1.0)
53-
-> bandpass_freqs: [0.01, 0.1]
54-
-> sample_period: (not connected, reads from nifti)
54+
3dcalc -a .../anat_reorient_0/sub-01_T1w_resample.nii.gz
55+
-b .../anat_skullstrip_ants/atropos_wf/.../mask_xform.nii.gz
56+
-expr "a*step(b)"
5557
```
5658

57-
The `ideal_bandpass` function uses `sample_period` to compute FFT frequency bin indices. With TR=1.0 instead of 2.0, the effective passband is 0.005-0.05 Hz (half the intended 0.01-0.1 Hz range). This affects both the BOLD bandpass filtering and the regressor bandpass filtering.
58-
59-
Confirmed by reproducing the filter: applying `ideal_bandpass` with TR=1.0 to C-PAC's raw regressors produces r=1.000 correlation with C-PAC's exported filtered regressors, while TR=2.0 (correct) produces r=0.758.
59+
The `-a` input is the reoriented T1w, not the `inu_n4_final` output. The brain mask itself is valid (derived from N4-corrected data), but all downstream steps (segmentation, registration, etc.) operate on uncorrected intensities.
6060

61-
This bug affects all outputs downstream of the frequency filter: cleaned BOLD, ReHo, timeseries extraction, and connectivity matrices. ALFF/fALFF are also affected since they are computed from the bandpassed BOLD.
61+
The standalone `n4_bias_field_correction` config option is also `Off` by default and not overridden by the RBC config, so no bias correction is applied at any stage.
6262

6363
---
6464

65-
## 5. DVFinal metrics computed from pre-regression BOLD (same as DVInit)
65+
### 2. Despiking runs in template space
6666

67-
The XCP-style QC file reports `meanDVFinal` and `motionDVCorrFinal` as post-regression DVARS metrics (per the XCP dictionary: "correlation of RMS and DVARS after regression"). However, inspecting the C-PAC source code (`xcp.py`) reveals that both `DVInit` and `DVFinal` are computed from the same pre-nuisance regression BOLD input.
67+
The RBC config sets `despike.space: template`. The logs confirm `3dDespike` ran on already-resampled template-space data, after single-step resampling and merging. Motion estimation and slice-timing correction both operate on non-despiked data, which negates the purpose of despiking (removing outlier timepoints that bias motion estimation).
6868

69-
This was confirmed empirically: `meanDVInit = meanDVFinal` and `motionDVCorrInit = motionDVCorrFinal` for every subject across all three datasets (HBN, PNC, NKI) during the small benchmarking study.
69+
Despiking should also happen before resampling for signal processing reasons: in native space, each voxel's timeseries reflects a consistent tissue location, so outlier detection operates on clean per-voxel signals. After resampling, each voxel is an interpolated blend of neighboring native voxels, which smooths out the sharp spikes that despiking is designed to catch and mixes signals across tissue boundaries.
7070

71-
In contrast, RBC correctly computes `DVFinal` from the post-regression bandpass-filtered BOLD (`cleaned_bold`), as intended.
71+
This also breaks the `calculate_motion_first` flag. The config inherits `calculate_motion_first: On` from `fmriprep-options`, which should estimate motion on raw (pre-STC) data. In `cpac_pipeline.py:1199-1210`, when this flag is true, the block order is `func_init + func_motion + func_preproc + [func_motion_correct_only] + ...`. But `func_preproc_blocks` includes both despike and STC, and since despike was deferred to template space, the block effectively only contained STC. The logs show motion estimation (`3dvolreg` for reference, `mcflirt` for correction) ran on post-STC data instead of raw data.
7272

7373
---
7474

75-
## Notes
75+
### 3. ALFF uses temporal std instead of mean FFT amplitude
76+
77+
C-PAC computes ALFF as the temporal standard deviation of a bandpass-filtered signal (`3dBandpass` followed by `3dTstat -stdev`), and fALFF as the ratio of filtered to unfiltered standard deviation. This differs from the original Zang et al. 2007 definition, which uses the arithmetic mean of FFT amplitude coefficients within the frequency band. The two approaches are related (both measure low-frequency power) but produce numerically different maps.
78+
79+
---
80+
81+
### 6. Motion correction runs twice (pre-STC for regressors, post-STC for spatial application)
82+
83+
C-PAC's logs show two separate mcflirt runs:
84+
- One on **pre-STC** data, used to produce motion parameter regressors and derivative-space outputs.
85+
- One on **post-STC** data, whose per-volume `.mat` transforms are applied alongside BBR and anat-to-template warps in the single-step template resampling, and whose corrected BOLD is also used for masking and BBR estimation.
7686

77-
### Two parallel motion correction paths
87+
RBC runs motion correction once: motion is estimated on pre-STC (despiked) data, and those `.mat` transforms are applied to STC'd BOLD in the single-step resample (see `src/rbc/core/functional/resampling.py:resample_bold_to_template`). The same motion parameters drive both the regressors and the spatial application.
7888

79-
The logs show two separate mcflirt runs: one on post-STC data (path 97, used for template-space outputs via single-step resampling) and one on pre-STC data (path 85, used for derivative-space outputs). The single-step resampling path does not use the mcflirt-corrected BOLD directly; it applies per-volume motion `.mat` transforms alongside BBR and anat-to-template warps in one interpolation pass. The mcflirt-corrected output from path 97 is only used for masking and BBR estimation. This is not incorrect (fewer interpolation passes is better), but the duplication is confusing.
89+
C-PAC's duplication is not numerically incorrect — the single-step resampling still uses one interpolation pass per volume — but estimating motion twice on differently-processed data risks subtle inconsistencies between the regressor motion params and the spatial transforms actually applied.

0 commit comments

Comments
 (0)