Add --n-cpus to parallize cubids validate --sequential #476

tien-tong · 2025-10-26T13:27:18Z

Closes #475 and #477.

Changes proposed in this pull request

For cubids validate:

change --sequential to --validation-scope {dataset, subject} (default: dataset)
change --sequential-subjects to --participant-label

Parallel validation for --validation-scope subject (cubids/workflows.py)

Added parallel processing with ProcessPoolExecutor
Implemented _validate_single_subject() to process one subject per process
Used hardlinks -> symlinks -> copy fallback to reduce I/O
Introduced n_cpus and max_workers; default worker count derived from n_cpus

Tests (cubids/tests/)

Test in test_cli.py for validation with --validation-scope subject --n-cpus
Test fixes in test_cubids.py (replace comments with assert)

Documentation that should be reviewed

docs/example.rst

Noted --validation-scope subject l avoids "RangeError: Invalid string length" on large datasets
Added example: cubids validate BIDS_Dataset_DataLad v0 --validation-scope subject --n-cpus 4

…n-cpus N

tien-tong · 2025-10-27T14:27:17Z

For cubids validate:

change --sequential to --validation-scope {dataset, subject} (default: dataset)
change --sequential-subjects to --participant-label

mattcieslak

Looks good! In the future it might be nice to break up the function into smaller functions, but as long as tests are passing this is good for now

mattcieslak · 2025-10-29T17:09:26Z

cubids/tests/test_cubids.py

+    # This test verifies the method completes without errors when called
    cubids_instance.datalad_save()
-    # Add assertions here
+    assert True


we will want to write more complex tests here eventually. This is fine for now

Add --n-cpus to parallize cubids validate --sequential

00ef452

This comment was marked as outdated.

Sign in to view

tien-tong added 4 commits October 26, 2025 09:31

fix linter errors

2831f5a

fix linter errors

de1899e

fix test

ebf3355

try to maximize CPU Efficiency when run 'cubids apply --sequential --…

02a926e

…n-cpus N

This comment was marked as outdated.

Sign in to view

tien-tong added 2 commits October 26, 2025 17:39

fix linter errors

a9ca7a6

fix linter errors

9463421

tien-tong linked an issue Oct 29, 2025 that may be closed by this pull request

cubids validate --sequential incorrectly triggers "PARTICIPANT_ID_MISMATCH" error #477

Closed

tien-tong added 3 commits October 29, 2025 09:26

change cli and try to fix the PARTICIPANT_ID_MISMATCH

fc57805

remove max_workers

6c97441

fix cubids validate --participant-label

957080a

This comment was marked as outdated.

Sign in to view

fix cubids validate --participant-label

2229e56

This comment was marked as outdated.

Sign in to view

tien-tong added 3 commits October 29, 2025 11:55

fix cubids validate --participant-label

9ab2c88

fix cubids validate --participant-label

75bd720

fix PARTICIPANT_ID_MISMATCH

ea3d13f

tien-tong requested a review from mattcieslak October 29, 2025 17:04

mattcieslak approved these changes Oct 29, 2025

View reviewed changes

check validation DataFrame rows instead of columns

f4a6769

tien-tong merged commit cb326b4 into main Oct 29, 2025
11 checks passed

tien-tong deleted the validate-sequential-parallel branch October 29, 2025 17:48

tien-tong added the enhancement New feature or request label Nov 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add --n-cpus to parallize cubids validate --sequential #476

Add --n-cpus to parallize cubids validate --sequential #476

Uh oh!

tien-tong commented Oct 26, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

tien-tong commented Oct 27, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

mattcieslak left a comment

Uh oh!

mattcieslak Oct 29, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Add --n-cpus to parallize cubids validate --sequential #476

Add --n-cpus to parallize cubids validate --sequential #476

Uh oh!

Conversation

tien-tong commented Oct 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes proposed in this pull request

Documentation that should be reviewed

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

tien-tong commented Oct 27, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

This comment was marked as outdated.

Uh oh!

mattcieslak left a comment

Choose a reason for hiding this comment

Uh oh!

mattcieslak Oct 29, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tien-tong commented Oct 26, 2025 •

edited

Loading

tien-tong commented Oct 27, 2025 •

edited

Loading