Skip to content

Log raw input image metadata at workflow start#334

Merged
nx10 merged 4 commits into
mainfrom
log-input-image-metadata
May 12, 2026
Merged

Log raw input image metadata at workflow start#334
nx10 merged 4 commits into
mainfrom
log-input-image-metadata

Conversation

@nx10
Copy link
Copy Markdown
Contributor

@nx10 nx10 commented May 11, 2026

We weren't logging much about the raw images going into the pipeline (just the file path for anat; TR + SliceTiming for func). This adds a header-only log_image_summary() in rbc.core.nifti and calls it from the anatomical and functional process_session orchestrators, so each run records exactly what entered: array shape, on-disk dtype, data size, voxel size, orientation, sform/qform spaces, and for 4D images volume count, slice axis/count/order, and header TR. No data is loaded, just the NIfTI header. It's best-effort: an unreadable header logs a warning rather than aborting the run.

Sample output on tests/data/ds000001:

Anatomical T1w: tests/data/ds000001/sub-01/anat/sub-01_T1w.nii.gz
Anatomical T1w: shape=(160, 192, 192), dtype=int16, size=11.2 MiB, voxel size=1 x 1.33 x 1.33 mm
Anatomical T1w: orientation=RAS, sform=SCANNER, qform=SCANNER
Functional BOLD: tests/data/ds000001/sub-01/func/sub-01_task-balloonanalogrisktask_run-01_bold.nii.gz
Functional BOLD: shape=(64, 64, 33, 300), dtype=int16, size=77.3 MiB, voxel size=3.12 x 3.12 x 4 mm
Functional BOLD: orientation=LAS, sform=SCANNER, qform=SCANNER
Functional BOLD: volumes=300, slices=33 along axis 2 (assumed; no dim_info), slice order=unknown, header TR=2 s

(size is the array footprint shape x dtype itemsize; the on-disk .nii.gz is much smaller. slice order=unknown is normal for BIDS data, where SliceTiming lives in the JSON sidecar - which FunctionalMetadata.load already logs separately.)

Functional runs still get the existing TR-source / SliceTiming logging from FunctionalMetadata.load. The all pipeline picks this up for free since it reuses these process_session functions. Longitudinal / metrics / QC are left alone since they consume derivatives, not raw inputs.

Add `log_image_summary()` to `rbc.core.nifti`: a header-only (no voxel
data loaded) helper that logs an INFO summary of a raw NIfTI input -
array shape, on-disk dtype, voxel size, axis orientation, sform/qform
coordinate spaces, and for 4D images the volume count, slice count, and
header TR.

Call it from the anatomical and functional `process_session` orchestrators
in place of the bare "Anatomical: <path>" / "Functional: <path>" lines, so
each run records exactly what entered the pipeline. Functional runs still
get TR-source and SliceTiming logging from `FunctionalMetadata.load`.
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 11, 2026

Coverage

Tests Skipped Failures Errors Time
791 0 💤 0 ❌ 0 🔥 11.334s ⏱️

prod(shape) * dtype.itemsize, formatted with binary units (B/KiB/MiB/GiB).
The on-disk .nii.gz size hides this; it's the number that matters when the
pipeline loads the array.
@nx10 nx10 requested a review from kaitj May 11, 2026 23:33
nx10 added 2 commits May 11, 2026 19:40
- Best-effort: a header that can't be read logs a warning instead of
  aborting the run; the real failure surfaces later when processing
  touches the file.
- 5D+ images report the trailing dims as `extra dims=...` instead of
  mislabeling them.
- Voxel size notes `(units unknown)` when xyzt_units is unset.
- Rename `uncompressed size` -> `size` (the loaded array is upcast to
  float64 anyway, so the longer name overpromised).
The 4D summary line now names the slice axis (from the header's dim_info,
or "axis 2 (assumed; no dim_info)" when unset) and the slice acquisition
order from slice_code ("unknown" when unset, which is the BIDS norm since
SliceTiming lives in the JSON sidecar — already logged by FunctionalMetadata).
Copy link
Copy Markdown
Contributor

@kaitj kaitj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm 🚀 (for the cross sectional)

I haven't taken a close look at if / how to integrate this into the longitudinal, but any reason to not also include it there for the anatomical and functional workflows?

@nx10 nx10 merged commit 0ae9776 into main May 12, 2026
8 checks passed
@nx10 nx10 deleted the log-input-image-metadata branch May 12, 2026 15:36
@nx10
Copy link
Copy Markdown
Contributor Author

nx10 commented May 12, 2026

Adding it to any raw/user provided images in longitudinal would be sensible - any images we create ourselves generally should not need it

@kaitj
Copy link
Copy Markdown
Contributor

kaitj commented May 12, 2026

Adding it to any raw/user provided images in longitudinal would be sensible - any images we create ourselves generally should not need it

I don't think we currently have a way to check for the creator? If the information is available, I would assume its in the metadata, but I think right now as long as a dataset can be queried by b2t and the file name matches expected entities, it would run. If this is the case, probably safer to handle the majority (or all) the inputs? Typing this, I think the same is potentially true for all the individual workflows (qc, metrics, etc.), though I think those are less likely since more files are required.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants