-
Notifications
You must be signed in to change notification settings - Fork 52
Improve memory usage in track finding postamble #908
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Improve memory usage in track finding postamble #908
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, count+fill it ended up being after all. 🤔 Good! I'm very on board with doing this consistently with what we found to be the most efficient during seed finding.
But I'd really like to reduce the number of individual memory allocations that we need for this. 🤔 With a single (SoA?) container for this new payload we could get away with allocating all needed GPU memory for the counting in one go.
device/common/include/traccc/finding/device/impl/count_tracks.ipp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am happy with the changes as long as the CI passes
device/common/include/traccc/finding/device/impl/build_tracks.ipp
Outdated
Show resolved
Hide resolved
I was able to one-up myself, removing the counting kernel entirely, reducing the memory allocations, keeping the memory usage low, and improving performance:
|
3aa466b
to
ac3cc15
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still very much on board with the direction.
Could you post a bit of a description of what you did though? I had to realize that it's a bit too much code for me to interpret from scratch. 😦 Just some main bullet points about how the algorithm finds the correct sizes now.
device/common/include/traccc/finding/device/impl/build_tracks.ipp
Outdated
Show resolved
Hide resolved
6b22d43
to
629470a
Compare
So the responsibility of the kernels in the old version of the algorithm was as follows:
Now, the core insight is that the checks that
Much simpler, and it has far fewer kernels. Note that in this new model |
629470a
to
6148d6f
Compare
Yeah I think this is a right way to improve 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the description!
Let's just make the internal vector use the correct memory resource, and this one can go ahead as far as I'm concerned.
6148d6f
to
29b4083
Compare
The code which turns the tips of our track finding into actual tracks uses an excessive amount of memory, as it massively overallocates. Indeed, it allocates memory as though all tips have the maximum number of track states, which is unrealistic. This commit makes it so that the number of valid track states is counted instead, making the allocation more precise. In my measurements, this more than halves the memory usage of traccc on $\langle\mu\rangle = 200$ ttbar events in the ODD.
29b4083
to
921c876
Compare
|
This PR also fixes a bug in the efficiency measurements by the way! Wink wink nudge nudge 😉 |
Performance summaryHere is a summary of the performance effects of this PR: GraphicalTabular
Note This is an automated message produced on the explicit request of a human being. |
Reminder that this important PR has been blocked for almost a month now. 😉 |
@krasznaa THere are indeed many PRs pending 🤔 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After quite some delay (sorry about that 😦), let's indeed finally get this in.
After resolving the (hopefully trivial... 🤞) merge conflicts.
The code which turns the tips of our track finding into actual tracks uses an excessive amount of memory, as it massively overallocates. Indeed, it allocates memory as though all tips have the maximum number of track states, which is unrealistic. This commit makes it so that the number of valid track states is counted instead, making the allocation more precise. In my measurements, this more than halves the memory usage of traccc on$\langle\mu\rangle = 200$ ttbar events in the ODD.
Before:
After:
Note: The SYCL changes are best-effort but I haven't tested them.