You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
## Summary
Implement a communications arena for comm buffers to replace
`the_fa_arena`. It creates a separate arena when GPU-aware MPI is used
and `the_arena` is not managed.
## Additional background
The motivation for this is a communication performance degradation that
is observed for GPU-aware MPI with `amrex.the_arena_is_managed=0`.
@WeiqunZhang has a hypothesis that this may be due to the need for
frequent re-registering of comm buffer pointers when using the same
device arena as the other compute data. Hence a separate arena in this
case would alleviate this issue.
`the_fa_arena` is eliminated in this PR and the communication buffer
directly uses `the_comms_arena` to simplify the code.
## Performance tests
The above stated performance degradation is particularly observed with
the `GPU/CNS/Exec/Sod` code under `Tests` and is alleviated by using a
separate comms arena as seen in the performance data below. `original`
refers to the state before we made the change in #3362 related to
`the_fa_arena` pointing to the device arena which allowed
`amrex.the_arena_is_managed=1` with GPU-aware MPI without a significant
performance hit. It is compared with the current development branch and
the proposed comms arena implementation. The data pointing to the
performance improvement from this PR is highlighted.

In other tests such as the `ABecLaplacian` linear solve or the ERF code,
using `amrex.the_arena_is_managed=0` did not show a significant
performance hit and using this comms arena implementation did not harm
the performance either. More comprehensive tests would be required to
determine the effect on other codes and platforms.
---------
Co-authored-by: Mukul Dave <[email protected]>
Co-authored-by: Weiqun Zhang <[email protected]>
0 commit comments