Skip to content

Conversation

@araikes
Copy link
Collaborator

@araikes araikes commented Oct 28, 2025

Changes proposed in this pull request

This PR fixes column names as strings handling for DSI Studio's GQI pipeline. DSI Studio uses the atlas NIFTI image and the TSV file to create already-populated connectivity matrices using the ROI names. Previously, the code here attempted to convert the region IDs to integers to then sort and organize. That was a sanity check originally to make sure that everything got sorted into the proper order.

This PR attempts to catch all-comers including cases where the column names in the connectivity matrix are integers or strings. DSI Studio will currently fail to correctly attribute column names for the "Ext" atlases due to a skipped value in the NIFTIs that is not reflected in the TSV file and so as long as the number of regions in the original atlas matches the number of regions in the subject-specific file, region names are simply imputed from the TSV's original order.

This will throw an error when A) The column names from DSI Studio's connectivity file and the region labels (either indices or labels as appropriate) from the TSV don't match and the length is not identical. This is specifically because missed ROI numbers in the NIFTI are treated as missing and for the Ext atlases as they currently stand, this will make it impossible to know which ROIs are genuinely missing. However, this is an edge-case to be addressed in the future.

I've tested this locally and it runs to completion without error for Brainnetome246Ext, Gordon333Ext, AICHA384Ext, AAL116, and 4S256Parcels as those capture the variety of atlas variations.

Additionally adds a test for dsi_studio_gqi. I think I did it right but I'll bet something's wrong with it.

Should fix #272 and close #143. Addresses the first point in #274.

Documentation that should be reviewed


Note

Make DSI Studio GQI connectivity robust to string labels and add CI/integration test coverage for the workflow.

  • Interfaces (DSI Studio):
    • GQI Connectivity Parsing: Rework _sanitized_connectivity_matrix and _sanitized_network_measures to use string ROI labels, add order-based fallback when lengths match, and emit MATLAB-compatible region_ids.
    • Class/Specs Cleanup: Replace custom DSIStudioCommandLineInputSpec with CommandLineInputSpec; standardize thread_count to num_threads across specs; refactor DSIStudioGQIReconstruction as standalone CommandLine with explicit output spec.
    • Atlas Graph: Distribute threads per atlas and pass num_threads to DSIStudioConnectivityMatrix.
    • Connectivity Outputs: Use TSV label column for official_labels and index for IDs in saved matfiles.
    • Misc: Add num_threads to _AutoTrackInputSpec; minor docstrings and output handling tweaks.
  • CI:
    • Add Recon_DSI_Studio_GQI CircleCI job; wire into workflows, coverage merge, and deployable gates; store artifacts under dsi_studio_gqi_recon/.
  • Tests:
    • Add integration test test_dsi_studio_gqi_recon and expected outputs list dsi_studio_gqi_recon_outputs.txt.

Written by Cursor Bugbot for commit 647da31. This will update automatically on new commits. Configure here.

Fix _sanitized_connectivity_matrix and _sanitized_network_measures indexing bug

- Replace np.searchsorted() with dict-based lookup
- Handle numeric and string labels robustly
- Add additional docstrings
Add dsi_studio_gqi test
Probably need this for test also
@araikes araikes requested review from mattcieslak and tsalo October 28, 2025 22:30
cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@araikes
Copy link
Collaborator Author

araikes commented Oct 28, 2025

I don't know what it's mad about on line 368. I tried with a line break and it still told me that Black would make changes.

cursor[bot]

This comment was marked as outdated.

@tsalo
Copy link
Member

tsalo commented Oct 29, 2025

This will throw an error when A) The column names from DSI Studio's connectivity file and the region labels (either indices or labels as appropriate) from the TSV don't match and the length is not identical.

I would prefer if it raised an error when there are region values in the NIfTI that are not present in the TSV. That would break workflows using some of our current atlases, but I'm okay with broken atlases raising errors until we fix the actual files.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took a first look at this. I need to look more closely to understand exactly what is present in the DSI Studio matfile and network text file formats.

Comment on lines 594 to 604
if not np.all(truncated_labels == matfile_region_ids):
if len(official_label_names) == len(matfile_region_ids):
print(
"Atlas/matfile string labels mismatch but lengths match — "
"falling back to order-based mapping."
)
# fallback: trust the order, ignore mask
new_row = np.arange(len(official_label_names))
in_this_mask = np.ones_like(official_label_names, dtype=bool)
else:
raise AssertionError("Atlas and matfile label names mismatch and lengths differ.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please raise an error here too.

@tsalo tsalo added the bug Something isn't working label Oct 29, 2025
tsalo added a commit that referenced this pull request Oct 29, 2025
Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I have a better sense of what's going on. I recommend dropping the numeric approach completely and committing to the string-based one, since we require string labels for all ROIs in our atlases.

Copy link
Member

@tsalo tsalo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For sanitize_network_measures.

cursor[bot]

This comment was marked as outdated.

Comment on lines 525 to 526
Array of official ROI labels (i.e., the names of the ROIs). The matrix in conmat will be reordered to
match the ROI labels in this array
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Array of official ROI labels (i.e., the names of the ROIs). The matrix in conmat will be reordered to
match the ROI labels in this array
Array of official ROI labels (i.e., the names of the ROIs).
The matrix in conmat will be reordered to match the ROI labels in this array

# Where does each column go? Make an index array
connectivity = m["connectivity"]
# Column names are binary strings
column_names = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is being flagged by black.

cursor[bot]

This comment was marked as outdated.

network_data["region_ids"] = [token.split("_")[-1] for token in tokens[1:]]
# Ensure cellstr output type for MATLAB compatibility
network_data["region_ids"] = np.array(
[[token] for token in tokens[1:]], dtype=object)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Network data region_ids shape mismatch bug

In _parse_network_file, network_data["region_ids"] is now a 2D array of single-element lists. This breaks array comparisons in _sanitized_network_measures (e.g., np.isin, np.all) which expect a 1D array of strings. It also creates an inconsistency with the 1D array assigned in the fallback at line 617.

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

qsirecon 1.1.1 + dsi_studio_gqi workflow exits with error in calc_connectivity [CI] Add test for dsi_studio_gqi

2 participants