Skip to content

Conversation

@tien-tong
Copy link
Contributor

@tien-tong tien-tong commented Oct 26, 2025

Closes #475 and #477.

Changes proposed in this pull request

  1. For cubids validate:

change --sequential to --validation-scope {dataset, subject} (default: dataset)
change --sequential-subjects to --participant-label

  1. Parallel validation for --validation-scope subject (cubids/workflows.py)
  • Added parallel processing with ProcessPoolExecutor
  • Implemented _validate_single_subject() to process one subject per process
  • Used hardlinks -> symlinks -> copy fallback to reduce I/O
  • Introduced n_cpus and max_workers; default worker count derived from n_cpus
  1. Tests (cubids/tests/)
  • Test in test_cli.py for validation with --validation-scope subject --n-cpus
  • Test fixes in test_cubids.py (replace comments with assert)

Documentation that should be reviewed

docs/example.rst

  • Noted --validation-scope subject l avoids "RangeError: Invalid string length" on large datasets
  • Added example: cubids validate BIDS_Dataset_DataLad v0 --validation-scope subject --n-cpus 4

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@tien-tong
Copy link
Contributor Author

tien-tong commented Oct 27, 2025

For cubids validate:

  • change --sequential to --validation-scope {dataset, subject} (default: dataset)
  • change --sequential-subjects to --participant-label

cursor[bot]

This comment was marked as outdated.

cursor[bot]

This comment was marked as outdated.

@tien-tong tien-tong requested a review from mattcieslak October 29, 2025 17:04
Copy link
Contributor

@mattcieslak mattcieslak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! In the future it might be nice to break up the function into smaller functions, but as long as tests are passing this is good for now

# This test verifies the method completes without errors when called
cubids_instance.datalad_save()
# Add assertions here
assert True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we will want to write more complex tests here eventually. This is fine for now

@tien-tong tien-tong merged commit cb326b4 into main Oct 29, 2025
11 checks passed
@tien-tong tien-tong deleted the validate-sequential-parallel branch October 29, 2025 17:48
@tien-tong tien-tong added the enhancement New feature or request label Nov 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

cubids validate --sequential incorrectly triggers "PARTICIPANT_ID_MISMATCH" error Parallelize cubids validate --sequential

3 participants