Skip to content

TSV to Parquet & rename _correlations Fix #247, #91#316

Open
jpillai00 wants to merge 6 commits into
mainfrom
tsv-to-pq
Open

TSV to Parquet & rename _correlations Fix #247, #91#316
jpillai00 wants to merge 6 commits into
mainfrom
tsv-to-pq

Conversation

@jpillai00
Copy link
Copy Markdown
Contributor

@jpillai00 jpillai00 commented Apr 16, 2026

Addresses #247 and #91

TSV -> Parquet

  • switched from pl.write_csv to pl.write_parquet
  • updated to extension=".parquet"

Note: only timeseries and connectome outputs are switched to Parquet, the XCP quality file stays as a tsv

correlations -> connectome

  • updated variable names/suffixes where relevant

Updated tests & verified values are identical to TSV outputs by comparing previous/new outputs for a test sub/ses

Note on orientation
Timeseries output preserves the original (ROIs × timepoints) orientation from the TSV, with timepoint indices as columns. The connectome now has ROI labels as named columns. Values are identical to previous outputs.

@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 16, 2026

Coverage

Tests Skipped Failures Errors Time
781 0 💤 0 ❌ 0 🔥 9.905s ⏱️

@jpillai00 jpillai00 marked this pull request as ready for review April 16, 2026 19:48
Copilot AI review requested due to automatic review settings April 16, 2026 19:48
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates RBC’s tabular derivatives to use Parquet instead of TSV and renames the correlation-matrix output to a more general “connectome”, aligning file naming/suffixes and updating tests accordingly.

Changes:

  • Switched QC and timeseries/connectome outputs from TSV to Parquet (write_csvwrite_parquet, .tsv.parquet).
  • Renamed “correlations/correlation_matrix” outputs to “connectome” across workflows and BIDS exports.
  • Updated unit/integration tests to read Parquet and reflect the new output naming (and intended timeseries orientation change).

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tests/unit/orchestration/test_qc.py Updates mocked QC output filename to .parquet.
tests/unit/core/test_timeseries.py Updates round-trip tests to read Parquet and expect connectome output.
tests/unit/core/qc/test_xcp.py Updates QC writer tests to validate Parquet column names/values.
tests/unit/bids/test_exports.py Updates expected exported metric/QC filenames and extensions to Parquet and connectome naming.
tests/integration/test_all.py Updates integration glob patterns to expect Parquet timeseries/connectome/QC outputs.
tests/integration/functional/qc/test_xcp.py Updates integration QC test to write/read Parquet.
src/rbc/workflows/qc.py Writes QC metrics to .parquet in the QC workflow.
src/rbc/workflows/metrics.py Renames metrics output field from correlation_matrix to connectome.
src/rbc/core/qc/xcp.py Writes QC metrics as a Parquet file via Polars.
src/rbc/core/metrics/timeseries.py Writes timeseries + connectome to Parquet and renames output path for connectome.
src/rbc/bids/qc.py Exports QC derivative with .parquet extension.
src/rbc/bids/metrics.py Exports timeseries/connectome derivatives as Parquet and updates suffix to connectome.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/rbc/core/metrics/timeseries.py Outdated
Comment thread src/rbc/workflows/qc.py
Comment thread tests/unit/core/qc/test_xcp.py
Comment thread tests/integration/functional/qc/test_xcp.py
Comment thread src/rbc/core/qc/xcp.py Outdated
@jpillai00 jpillai00 marked this pull request as draft April 16, 2026 20:13
@jpillai00 jpillai00 marked this pull request as ready for review April 17, 2026 20:08
@nx10
Copy link
Copy Markdown
Contributor

nx10 commented May 11, 2026

  • ROI labels do not appear in the timerseries file (rows)
  • Connectome rows also have no labels only columns do
  • pl.DataFrame(ts, schema=tp_names) with stringified ints as column names is unusual
  • We might want to consider _relmat instead of _connectome proposed in https://bids.neuroimaging.io/extensions/beps/bep_017.html
  • We could consider long format (roi, t, value) - probably best for downstream

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants