-
Notifications
You must be signed in to change notification settings - Fork 12
Fix DSI Studio GQI connectivity #277
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Fix _sanitized_connectivity_matrix and _sanitized_network_measures indexing bug - Replace np.searchsorted() with dict-based lookup - Handle numeric and string labels robustly - Add additional docstrings
Add dsi_studio_gqi test
Probably need this for test also
|
I don't know what it's mad about on line 368. I tried with a line break and it still told me that Black would make changes. |
I would prefer if it raised an error when there are region values in the NIfTI that are not present in the TSV. That would break workflows using some of our current atlases, but I'm okay with broken atlases raising errors until we fix the actual files. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a first look at this. I need to look more closely to understand exactly what is present in the DSI Studio matfile and network text file formats.
qsirecon/interfaces/dsi_studio.py
Outdated
| if not np.all(truncated_labels == matfile_region_ids): | ||
| if len(official_label_names) == len(matfile_region_ids): | ||
| print( | ||
| "Atlas/matfile string labels mismatch but lengths match — " | ||
| "falling back to order-based mapping." | ||
| ) | ||
| # fallback: trust the order, ignore mask | ||
| new_row = np.arange(len(official_label_names)) | ||
| in_this_mask = np.ones_like(official_label_names, dtype=bool) | ||
| else: | ||
| raise AssertionError("Atlas and matfile label names mismatch and lengths differ.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please raise an error here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I have a better sense of what's going on. I recommend dropping the numeric approach completely and committing to the string-based one, since we require string labels for all ROIs in our atlases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For sanitize_network_measures.
qsirecon/interfaces/dsi_studio.py
Outdated
| Array of official ROI labels (i.e., the names of the ROIs). The matrix in conmat will be reordered to | ||
| match the ROI labels in this array |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Array of official ROI labels (i.e., the names of the ROIs). The matrix in conmat will be reordered to | |
| match the ROI labels in this array | |
| Array of official ROI labels (i.e., the names of the ROIs). | |
| The matrix in conmat will be reordered to match the ROI labels in this array |
| # Where does each column go? Make an index array | ||
| connectivity = m["connectivity"] | ||
| # Column names are binary strings | ||
| column_names = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is being flagged by black.
| network_data["region_ids"] = [token.split("_")[-1] for token in tokens[1:]] | ||
| # Ensure cellstr output type for MATLAB compatibility | ||
| network_data["region_ids"] = np.array( | ||
| [[token] for token in tokens[1:]], dtype=object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Network data region_ids shape mismatch bug
In _parse_network_file, network_data["region_ids"] is now a 2D array of single-element lists. This breaks array comparisons in _sanitized_network_measures (e.g., np.isin, np.all) which expect a 1D array of strings. It also creates an inconsistency with the 1D array assigned in the fallback at line 617.
Changes proposed in this pull request
This PR fixes column names as strings handling for DSI Studio's GQI pipeline. DSI Studio uses the atlas NIFTI image and the TSV file to create already-populated connectivity matrices using the ROI names. Previously, the code here attempted to convert the region IDs to integers to then sort and organize. That was a sanity check originally to make sure that everything got sorted into the proper order.
This PR attempts to catch all-comers including cases where the column names in the connectivity matrix are integers or strings. DSI Studio will currently fail to correctly attribute column names for the "Ext" atlases due to a skipped value in the NIFTIs that is not reflected in the TSV file and so as long as the number of regions in the original atlas matches the number of regions in the subject-specific file, region names are simply imputed from the TSV's original order.
This will throw an error when A) The column names from DSI Studio's connectivity file and the region labels (either indices or labels as appropriate) from the TSV don't match and the length is not identical. This is specifically because missed ROI numbers in the NIFTI are treated as missing and for the Ext atlases as they currently stand, this will make it impossible to know which ROIs are genuinely missing. However, this is an edge-case to be addressed in the future.
I've tested this locally and it runs to completion without error for
Brainnetome246Ext,Gordon333Ext,AICHA384Ext,AAL116, and4S256Parcelsas those capture the variety of atlas variations.Additionally adds a test for
dsi_studio_gqi. I think I did it right but I'll bet something's wrong with it.Should fix #272 and close #143. Addresses the first point in #274.
Documentation that should be reviewed
Note
Make DSI Studio GQI connectivity robust to string labels and add CI/integration test coverage for the workflow.
_sanitized_connectivity_matrixand_sanitized_network_measuresto use string ROI labels, add order-based fallback when lengths match, and emit MATLAB-compatibleregion_ids.DSIStudioCommandLineInputSpecwithCommandLineInputSpec; standardizethread_counttonum_threadsacross specs; refactorDSIStudioGQIReconstructionas standaloneCommandLinewith explicit output spec.num_threadstoDSIStudioConnectivityMatrix.labelcolumn forofficial_labelsandindexfor IDs in saved matfiles.num_threadsto_AutoTrackInputSpec; minor docstrings and output handling tweaks.Recon_DSI_Studio_GQICircleCI job; wire into workflows, coverage merge, and deployable gates; store artifacts underdsi_studio_gqi_recon/.test_dsi_studio_gqi_reconand expected outputs listdsi_studio_gqi_recon_outputs.txt.Written by Cursor Bugbot for commit 647da31. This will update automatically on new commits. Configure here.