Skip to content

[Bug] Error in transfer halo particles#111

Open
bodhinandach wants to merge 1 commit intomasterfrom
bugfix/transfer_particle_rank
Open

[Bug] Error in transfer halo particles#111
bodhinandach wants to merge 1 commit intomasterfrom
bugfix/transfer_particle_rank

Conversation

@bodhinandach
Copy link
Member

Describe the PR
This PR fixes potential deadlocks and crashes in the MPI halo particle transfer routine. The changes are minimal and preserve the original communication pattern. Key fixes include:

  • Using stable storage (std::vector<unsigned> send_counts) for particle counts in nonblocking sends, avoiding dangling pointers.
  • Correctly sizing send_requests with resize() instead of reserve(), preventing out-of-bounds writes.
  • Matching MPI_Probe with MPI_Recv using the probed status.MPI_SOURCE and status.MPI_TAG, eliminating message size/source mismatches.
  • Ensuring type consistency during unpack (ptype, nmaterials, mat_id now use unsigned with MPI_UNSIGNED).

These changes eliminate common sources of bottlenecks and undefined behavior, which particularly emerge in large numbers of particle simulations.

Related Issues/PRs
N/A

Additional context
N/A

@bodhinandach bodhinandach self-assigned this Oct 3, 2025
@bodhinandach bodhinandach added the bug Something isn't working label Oct 3, 2025
@codecov
Copy link

codecov bot commented Oct 3, 2025

Codecov Report

❌ Patch coverage is 0% with 27 lines in your changes missing coverage. Please review.
✅ Project coverage is 95.73%. Comparing base (e512900) to head (809ba59).
⚠️ Report is 12 commits behind head on master.

Files with missing lines Patch % Lines
include/mesh/mesh.tcc 0.00% 27 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master     #111      +/-   ##
==========================================
- Coverage   95.73%   95.73%   -0.00%     
==========================================
  Files         244      244              
  Lines       51462    51464       +2     
==========================================
  Hits        49267    49267              
- Misses       2195     2197       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Collaborator

@thomaszyu thomaszyu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@bodhinandach
Copy link
Member Author

@thomaszyu, @cgeudeker, and @Curiim, when you have time, would you be able to pull/cherry-pick this branch/commit temporarily to your favourite branch and try doing the usual MPI compile and run? This should not change anything in your simulation.

@thomaszyu
Copy link
Collaborator

Hi Nanda, I cherrypicked your bug fix into my force output hardcode branch and ran MPM with 16 MPI ranks, the results look good to me.

@bodhinandach
Copy link
Member Author

@thomaszyu Thanks for checking. I notice there is still a bug... especially when running in full-debug. Will try to understand this further before merging this branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants