Skip to content

Conversation

@tsalo
Copy link
Member

@tsalo tsalo commented Jan 17, 2025

Closes #281, closes #331, and closes #304 (in combination with #403).

A few things to look into

  • Fields that are arrays for some subjects and floats for others (e.g., RepetitionTimeExcitation can be either).
  • Is the dataframe saved and loaded back into memory at any point? If so I need to find where and use the new utility functions I wrote.
    • @mattcieslak says it's not, so I'll leave this alone for now.
  • Any other places where metadata fields are compared?

Changes proposed in this pull request

  • Allow lists in the param_group_df parameter to cubids.cubids.format_params()
    • Fields with "tolerance" in the config will be fed into the AgglomerativeClustering step, but only after splitting and grouping by the length of the arrays.
    • Fields without "tolerance" will be converted to a string and compared based on that.
  • Rename cubids.cubids.format_params() to cubids.utils.cluster_single_parameters()
  • Allow lists in cubids.cubids.round_params().
  • Drop "Obliquity" boolean field in favor of "ImageOrientationPatientDICOM" array field.

@tsalo tsalo added the enhancement New feature or request label Jan 17, 2025
@mattcieslak
Copy link
Contributor

This is a very smart approach to the problem. I think it will work.

I don't think any of the dataframes will ever be written and reloaded during the course of a command line call.

@tsalo tsalo marked this pull request as ready for review January 29, 2025 20:50
@tsalo
Copy link
Member Author

tsalo commented Jan 29, 2025

@mattcieslak do you want me to merge?

@tsalo tsalo requested a review from mattcieslak February 4, 2025 19:51
@tsalo tsalo requested a review from tien-tong February 5, 2025 20:15
tsalo added 5 commits February 5, 2025 15:32
If you have two unique values (NaNs and some actual value), it would label everything as cluster 0, but it should probably label the actual values as 0 and the NaNs as 1.
@tsalo
Copy link
Member Author

tsalo commented Feb 27, 2025

This interacts with #445 in that we'll be able to group based on file collection-related entities, which are going to all be arrays. For example, if one file named ...task-rest_echo-1_bold.nii.gz has EchoTimes: [1, 2, 3] and another has EchoTimes: [1, 2, 3, 4] because the first is from a 3-echo run and the second is from a 4-echo run, this will cluster them separately.

Copy link
Contributor

@tien-tong tien-tong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

As far as I can tell, the functions used for grouping/clustering were _get_param_groups and format_params (now renamed to cluster_single_parameters), and you have updated both in this PR.

I like that we can now also compare lists of strings.

Copy link
Contributor

@mattcieslak mattcieslak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm kind of attached to the obliquity variant and didn't see on a quick readthrough what this will look like with the new convention. What will oblique scan file names look like?

@tsalo
Copy link
Member Author

tsalo commented Apr 4, 2025

They should look something like ImageOrientationPatientDICOMC[1|2|...], where the number at the end is the cluster for the orientation values. That way, different levels of obliquity get put in different clusters.

@tsalo
Copy link
Member Author

tsalo commented Apr 4, 2025

I could hardcode in some kind of rename for that field? So it ends up being ObliquityC[1|2|...]? I'd rather push that until a separate PR (before the next release of course) though.

@mattcieslak
Copy link
Contributor

I think it would be hard to know what's going on with directly using the values in imageorientationpatientdicom because that field is part of an affine matrix. If the person doing the scanning was angling it on a subject-by-subject basis then these values will be all over the place and the clusters won't make sense.

I like the idea of having an ObliqueDim1, ObliqueDim2, etc that can be true or false. That will capture meaningful variation, particularly for dmri

@tsalo
Copy link
Member Author

tsalo commented Apr 4, 2025

Okay I've reverted the obliquity-related changes. Can you take another look?

@tsalo tsalo requested a review from mattcieslak April 4, 2025 19:01
Copy link
Contributor

@mattcieslak mattcieslak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@tsalo tsalo merged commit f7e5af4 into main Apr 4, 2025
10 checks passed
@tsalo tsalo deleted the list-metadata branch April 4, 2025 19:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Trying to add ShimSetting to config.yml doesn't work Support file collections Support array-type metadata fields

4 participants