Memory Usage: Add memory transferred between Kokkos Mem Spaces#272
Memory Usage: Add memory transferred between Kokkos Mem Spaces#272vlkale wants to merge 23 commits intokokkos:developfrom
Conversation
…nce they are not used in the function.
|
Here is output from the stream.cuda in the Kokkos-core benchmarks directory run on Perlmutter on 1 GPU. The number of bytes in the HighWater at the last point in time shown in the first table is equal to the data transferred between host and device. |
crtrott
left a comment
There was a problem hiding this comment.
Just put it into a separate file otherwise this is good.
Co-authored-by: Damien L-G <dalg24+github@gmail.com>
|
I have updated with the requested changes. I did a quick check on my mac laptop and it works on my laptop fine with the change to have a separate file, and for reference, here is the output: (As can be seen, there are now two files rather than one.) Right now, there seems to be blocker on the CI - I don't know why the CI checks for OpenMP, CUDA and HIP (non-simple builds) are failing now. The problem is arising from Kokkos_Profiling.hpp and is coming from int_for_synchronization_reason(), as can be seen from the logs. I don't think I changed anything in this PR that would impact that? This is a CI check from when it was working previously: |
This is fixed. |
Have you rebased on top of |
Thanks for checking this. As mentioned in the last sentence of my previous comment (#272 (comment)), I did rebase on top of Actually, I think this has to do with Kokkos Tools Issue #275, where there is ultimately an incompatibility with of Kokkos Tools with the current Kokkos version (#275 (comment)). This issue came up in early October, which was after the previous successful run I linked. |
This PR adds memory transferred between Kokkos Mem Spaces to the Kokkos Tools memory-usage tool library. This is done based on a request in Kokkos Tools Github Issue #50.
This PR adds functionality to the tool so that the size of data transferred in the deep_copy in a Kokkos application program is accumulated during the execution of a Kokkos program. The deep_copy accumulation is done per Kokkos Memory Space dst->src pair (e.g., OpenMP on host CUDA on device). Note that the "dst" and "src" Kokkos memory space being the same means that there was a deep_copy in the same memory space.
When Kokkos is finalized in a Kokkos application program, the finalize callback prints out the deep_copy accumulations per memory space to the file.