Skip to content

Conversation

@AlexanderSinn
Copy link
Member

@AlexanderSinn AlexanderSinn commented Nov 19, 2025

Summary

This PR adds the function amrex::Gpu::streamFree (Arena* arena, void* mem) that can be used to free memory the next time the current GPU stream is synchronized.

This is based on #4432 but with much reduced complexity from OMP.
The interface is now opt-in and always available, instead of needing to be enabled using runtime parameters.

Additional background

Checklist

The proposed changes:

  • fix a bug or incorrect behavior in AMReX
  • add new capabilities to AMReX
  • changes answers in the test suite to more than roundoff level
  • are likely to significantly affect the results of downstream AMReX users
  • include documentation in the code and/or rst files, if appropriate

@WeiqunZhang
Copy link
Member

/run-hpsf-gitlab-ci

@github-actions
Copy link

GitLab CI has started at https://gitlab.spack.io/amrex/amrex/-/pipelines/1319922.

@amrex-gitlab-ci-reporter
Copy link

GitLab CI 1319922 finished with status: success. See details at https://gitlab.spack.io/amrex/amrex/-/pipelines/1319922.

@AlexanderSinn
Copy link
Member Author

Maybe this should be named asyncFree or freeAsync instead.

@WeiqunZhang
Copy link
Member

I like freeAsync, which is similar to copyAsync.

@AlexanderSinn
Copy link
Member Author

For a case with very few particles, this helps to significantly improve the performance of hipace shiftSlippedParticles (which calls amrex::partitionParticles) by removing two stream synchronizes.

AMReX dev, 79 µs on the GPU:

image

Using this PR, 59 µs on the GPU:

image

@WeiqunZhang WeiqunZhang changed the title Add amrex::Gpu::streamFree Add amrex::Gpu::freeAsync Dec 1, 2025
@WeiqunZhang
Copy link
Member

The PR LGTM. I will merge it after the 25.12 release.

@AlexanderSinn Something for the future. Maybe we could use this to replace the CUDA stream ordered memory allocator in the implementation of The_Async_Arena().

@AlexanderSinn
Copy link
Member Author

Eventually an allocAsync could be added that can use the memory in m_free_wait_list. This will have stricter usage requirements, mainly that the memory can only be accessed in stream order and not by the host. Additionally, we would need to get the capacity of an allocation from the arena and store it in m_free_wait_list.

I think ultimately The_Async_Arena should use both freeAsync and allocAsync so that in loops that allocate and free a lot with no sync, it can reuse memory effectively. I think this might be very similar to what cudaMallocAsync is doing internally, just with a cross-platform implementation, without the overhead of calling the CUDA API and using The_Arena as backing.

Should I change The_Async_Arena to use freeAsync or add allocAsync first? Or do both at the same time?

@WeiqunZhang
Copy link
Member

Let's wait until this PR is merged.

I am actually thinking of something simpler. Create a new Arena derived class for The_Async_Arena(). The object contains a map for <void*, int>, where int is the stream index. I's alloc will call The_Arena()->alloc and update the map. Its free will call freeAsync.

@AlexanderSinn
Copy link
Member Author

AlexanderSinn commented Dec 3, 2025

Yes, this will be in a new PR. The_Async_Arena uses PArena which is already separate. Should PArena be changed to use freeAsync or kept as-is but without an interface? Do we need to keep track of the stream index at allocation? I would have  just used whatever stream is active when the memory is freed. I can't think of a use case that would have different streams active between alloc and free, so I don't know which version is preferable.

@WeiqunZhang
Copy link
Member

I think we should leave PArena separated and create a new class. Yes, we probably don't even need to have a map storing the stream index during allocation. That would be even simpler. We can also think about maybe when CArena is about to run out of memory, we could try to immediately free the memory freed by freeAsync.

@WeiqunZhang WeiqunZhang merged commit 477c09f into AMReX-Codes:development Dec 3, 2025
73 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants