Log raw input image metadata at workflow start#334
Conversation
Add `log_image_summary()` to `rbc.core.nifti`: a header-only (no voxel data loaded) helper that logs an INFO summary of a raw NIfTI input - array shape, on-disk dtype, voxel size, axis orientation, sform/qform coordinate spaces, and for 4D images the volume count, slice count, and header TR. Call it from the anatomical and functional `process_session` orchestrators in place of the bare "Anatomical: <path>" / "Functional: <path>" lines, so each run records exactly what entered the pipeline. Functional runs still get TR-source and SliceTiming logging from `FunctionalMetadata.load`.
prod(shape) * dtype.itemsize, formatted with binary units (B/KiB/MiB/GiB). The on-disk .nii.gz size hides this; it's the number that matters when the pipeline loads the array.
- Best-effort: a header that can't be read logs a warning instead of aborting the run; the real failure surfaces later when processing touches the file. - 5D+ images report the trailing dims as `extra dims=...` instead of mislabeling them. - Voxel size notes `(units unknown)` when xyzt_units is unset. - Rename `uncompressed size` -> `size` (the loaded array is upcast to float64 anyway, so the longer name overpromised).
The 4D summary line now names the slice axis (from the header's dim_info,
or "axis 2 (assumed; no dim_info)" when unset) and the slice acquisition
order from slice_code ("unknown" when unset, which is the BIDS norm since
SliceTiming lives in the JSON sidecar — already logged by FunctionalMetadata).
kaitj
left a comment
There was a problem hiding this comment.
lgtm 🚀 (for the cross sectional)
I haven't taken a close look at if / how to integrate this into the longitudinal, but any reason to not also include it there for the anatomical and functional workflows?
|
Adding it to any raw/user provided images in longitudinal would be sensible - any images we create ourselves generally should not need it |
I don't think we currently have a way to check for the creator? If the information is available, I would assume its in the metadata, but I think right now as long as a dataset can be queried by b2t and the file name matches expected entities, it would run. If this is the case, probably safer to handle the majority (or all) the inputs? Typing this, I think the same is potentially true for all the individual workflows (qc, metrics, etc.), though I think those are less likely since more files are required. |
We weren't logging much about the raw images going into the pipeline (just the file path for anat; TR + SliceTiming for func). This adds a header-only
log_image_summary()inrbc.core.niftiand calls it from the anatomical and functionalprocess_sessionorchestrators, so each run records exactly what entered: array shape, on-disk dtype, data size, voxel size, orientation, sform/qform spaces, and for 4D images volume count, slice axis/count/order, and header TR. No data is loaded, just the NIfTI header. It's best-effort: an unreadable header logs a warning rather than aborting the run.Sample output on
tests/data/ds000001:(
sizeis the array footprint shape x dtype itemsize; the on-disk.nii.gzis much smaller.slice order=unknownis normal for BIDS data, where SliceTiming lives in the JSON sidecar - whichFunctionalMetadata.loadalready logs separately.)Functional runs still get the existing TR-source / SliceTiming logging from
FunctionalMetadata.load. Theallpipeline picks this up for free since it reuses theseprocess_sessionfunctions. Longitudinal / metrics / QC are left alone since they consume derivatives, not raw inputs.