Support array-type metadata fields in `cubids group` #407

tsalo · 2025-01-17T18:05:04Z

Closes #281, closes #331, and closes #304 (in combination with #403).

A few things to look into

Fields that are arrays for some subjects and floats for others (e.g., RepetitionTimeExcitation can be either).
Is the dataframe saved and loaded back into memory at any point? If so I need to find where and use the new utility functions I wrote.
- @mattcieslak says it's not, so I'll leave this alone for now.
Any other places where metadata fields are compared?

Changes proposed in this pull request

Allow lists in the param_group_df parameter to cubids.cubids.format_params()
- Fields with "tolerance" in the config will be fed into the AgglomerativeClustering step, but only after splitting and grouping by the length of the arrays.
- Fields without "tolerance" will be converted to a string and compared based on that.
Rename cubids.cubids.format_params() to cubids.utils.cluster_single_parameters()
Allow lists in cubids.cubids.round_params().
~~Drop "Obliquity" boolean field in favor of "ImageOrientationPatientDICOM" array field.~~

mattcieslak · 2025-01-28T20:24:16Z

This is a very smart approach to the problem. I think it will work.

I don't think any of the dataframes will ever be written and reloaded during the course of a command line call.

tsalo · 2025-01-29T21:03:58Z

@mattcieslak do you want me to merge?

If you have two unique values (NaNs and some actual value), it would label everything as cluster 0, but it should probably label the actual values as 0 and the NaNs as 1.

tsalo · 2025-02-27T22:02:54Z

This interacts with #445 in that we'll be able to group based on file collection-related entities, which are going to all be arrays. For example, if one file named ...task-rest_echo-1_bold.nii.gz has EchoTimes: [1, 2, 3] and another has EchoTimes: [1, 2, 3, 4] because the first is from a 3-echo run and the second is from a 4-echo run, this will cluster them separately.

tien-tong

Looks good to me.

As far as I can tell, the functions used for grouping/clustering were _get_param_groups and format_params (now renamed to cluster_single_parameters), and you have updated both in this PR.

I like that we can now also compare lists of strings.

mattcieslak

I'm kind of attached to the obliquity variant and didn't see on a quick readthrough what this will look like with the new convention. What will oblique scan file names look like?

tsalo · 2025-04-04T14:52:13Z

They should look something like ImageOrientationPatientDICOMC[1|2|...], where the number at the end is the cluster for the orientation values. That way, different levels of obliquity get put in different clusters.

tsalo · 2025-04-04T14:53:59Z

I could hardcode in some kind of rename for that field? So it ends up being ObliquityC[1|2|...]? I'd rather push that until a separate PR (before the next release of course) though.

mattcieslak · 2025-04-04T15:12:16Z

I think it would be hard to know what's going on with directly using the values in imageorientationpatientdicom because that field is part of an affine matrix. If the person doing the scanning was angling it on a subject-by-subject basis then these values will be all over the place and the clusters won't make sense.

I like the idea of having an ObliqueDim1, ObliqueDim2, etc that can be true or false. That will capture meaningful variation, particularly for dmri

tsalo · 2025-04-04T15:58:04Z

Okay I've reverted the obliquity-related changes. Can you take another look?

mattcieslak

LGTM!

Support array-type metadata fields.

e061f4c

tsalo added the enhancement New feature or request label Jan 17, 2025

tsalo added 9 commits January 17, 2025 13:25

Try supporting lists of strings too.

ce5dea8

Merge branch 'main' into list-metadata

d6b194a

Add test.

144a492

Update test_utils.py

52d02d0

Update stuff.

7a546b6

Merge branch 'main' into list-metadata

45bcc76

Update test_utils.py

6c3f3e1

Merge branch 'main' into list-metadata

7c50435

Merge branch 'main' into list-metadata

0460d7f

tsalo marked this pull request as ready for review January 29, 2025 20:50

tsalo mentioned this pull request Jan 31, 2025

Evaluate obliquity as an array of values #421

Open

tsalo added 9 commits February 4, 2025 11:50

Merge branch 'main' into list-metadata

e1b1d9b

Rename format_params to cluster_single_parameters.

dc07c6f

Keep working.

e3ca352

Merge branch 'main' into list-metadata

6a5c09b

Move cluster_single_parameters from cubids to utils.

929bfab

Fix import.

c2265dd

Remove unused function.

71d5e73

Update utils.py

3fde7ab

Update test_utils.py

10ce7c7

tsalo requested a review from mattcieslak February 4, 2025 19:51

tsalo added 4 commits February 5, 2025 14:49

Update round_params too.

9897b03

Update test.

a318078

Update.

ff4f5f8

Update.

0441113

tsalo requested a review from tien-tong February 5, 2025 20:15

tsalo added 5 commits February 5, 2025 15:32

Merge branch 'main' into list-metadata

6ede27a

Merge branch 'main' into list-metadata

aa3aa08

Merge branch 'main' into list-metadata

f109d70

Fix possible bug from #439.

d31bf32

If you have two unique values (NaNs and some actual value), it would label everything as cluster 0, but it should probably label the actual values as 0 and the NaNs as 1.

Merge branch 'main' into list-metadata

e6d8d35

tsalo mentioned this pull request Mar 3, 2025

Drop try/except in get_param_groups_dataframes #449

Open

tsalo added 5 commits March 3, 2025 11:12

Allow ndarray metadata.

ed4c3cf

Add ImageOrientationPatientDICOM, remove Obliquity

c8986f9

Remove obliquity mentions.

b31ee74

Merge branch 'main' into list-metadata

f52a419

Merge branch 'main' into list-metadata

fbbdd26

tien-tong approved these changes Apr 4, 2025

View reviewed changes

Merge branch 'main' into list-metadata

35e1847

tsalo mentioned this pull request Apr 4, 2025

Refactor CuBIDS to support file collections #308

Closed

mattcieslak reviewed Apr 4, 2025

View reviewed changes

tsalo added 3 commits April 4, 2025 11:55

Revert obliquity-related changes.

0034af1

Update test_bond.py

43171f1

Update example.rst

85348ea

tsalo requested a review from mattcieslak April 4, 2025 19:01

mattcieslak approved these changes Apr 4, 2025

View reviewed changes

tsalo merged commit f7e5af4 into main Apr 4, 2025
10 checks passed

tsalo deleted the list-metadata branch April 4, 2025 19:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Support array-type metadata fields in `cubids group` #407

Support array-type metadata fields in `cubids group` #407

Uh oh!

tsalo commented Jan 17, 2025 •

edited

Loading

Uh oh!

mattcieslak commented Jan 28, 2025

Uh oh!

tsalo commented Jan 29, 2025

Uh oh!

tsalo commented Feb 27, 2025

Uh oh!

tien-tong left a comment

Uh oh!

mattcieslak left a comment

Uh oh!

tsalo commented Apr 4, 2025

Uh oh!

tsalo commented Apr 4, 2025

Uh oh!

mattcieslak commented Apr 4, 2025

Uh oh!

tsalo commented Apr 4, 2025

Uh oh!

mattcieslak left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support array-type metadata fields in cubids group #407

Support array-type metadata fields in cubids group #407

Uh oh!

Conversation

tsalo commented Jan 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this pull request

Uh oh!

mattcieslak commented Jan 28, 2025

Uh oh!

tsalo commented Jan 29, 2025

Uh oh!

tsalo commented Feb 27, 2025

Uh oh!

tien-tong left a comment

Choose a reason for hiding this comment

Uh oh!

mattcieslak left a comment

Choose a reason for hiding this comment

Uh oh!

tsalo commented Apr 4, 2025

Uh oh!

tsalo commented Apr 4, 2025

Uh oh!

mattcieslak commented Apr 4, 2025

Uh oh!

tsalo commented Apr 4, 2025

Uh oh!

mattcieslak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Support array-type metadata fields in `cubids group` #407

Support array-type metadata fields in `cubids group` #407

tsalo commented Jan 17, 2025 •

edited

Loading