Skip to content

Conversation

@lifflander
Copy link
Contributor

@lifflander lifflander commented Dec 3, 2025

Fixes #11

@lifflander lifflander force-pushed the 11-fix-single-rank-termination-case-with-commmpi branch from 09acf24 to 8032c0e Compare December 3, 2025 20:54
@lifflander lifflander changed the title #5: termination: fix single rank case #11: termination: fix single rank case Dec 3, 2025
Copilot finished reviewing on behalf of lifflander December 3, 2025 20:56
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses issue #5 by adding special handling for the single rank (size=1) case in the termination detection algorithm. The termination detector uses a tree-based approach that doesn't work correctly when only one MPI rank exists, requiring specialized logic to progress the state machine.

Key changes:

  • Extracted child completion checking into a new checkAllChildrenComplete() method
  • Added singleRank() helper method and special case handling in sendControlToChildren() and wave restart logic
  • Updated printf format specifiers from %lld to %" PRIu64 " for portable uint64_t printing

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/vt-lb/comm/MPI/termination.h Adds new method declarations checkAllChildrenComplete() and singleRank() to support single rank termination detection
src/vt-lb/comm/MPI/termination.cc Implements single rank handling logic, refactors completion checking, and fixes printf format specifiers for uint64_t
src/vt-lb/comm/MPI/comm_mpi.h Adds single rank state machine progression in poll loop and initialization logic for rank 0

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

void TerminationDetector::onResponse(uint64_t in_sent, uint64_t in_recv) {
#if DEBUG_TERMINATION
printf("Rank %d: received response: sent=%lld, recv=%lld, global_sent1=%lld, global_recv1_=%lld waiting_children=%d\n",
printf("Rank %d: received response: sent=%" PRIu64 ", recv=%" PRIu64 ", global_sent1=%" PRIu64 ", global_recv1=%" PRIu64 " waiting_children=%d\n",
Copy link

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing closing underscore in format string: global_recv1_= should be global_recv1= to match the variable name pattern in the output (the other variables don't have trailing underscores in the format string).

Suggested change
printf("Rank %d: received response: sent=%" PRIu64 ", recv=%" PRIu64 ", global_sent1=%" PRIu64 ", global_recv1=%" PRIu64 " waiting_children=%d\n",
printf("Rank %d: received response: sent=%" PRIu64 ", recv=%" PRIu64 ", global_sent1=%" PRIu64 ", global_recv1_=%" PRIu64 " waiting_children=%d\n",

Copilot uses AI. Check for mistakes.
Copy link
Contributor

@nlslatt nlslatt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me and fixes my problem.

@lifflander lifflander merged commit 8e1056e into master Dec 3, 2025
5 of 6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Fix single rank termination case with CommMPI

3 participants