Algorithm Harmonization #5.0 SYCL CKF, main branch (2026.01.23.)#1240
Algorithm Harmonization #5.0 SYCL CKF, main branch (2026.01.23.)#1240stephenswat merged 6 commits intoacts-project:mainfrom
Conversation
f5506f0 to
0379cea
Compare
These are running!
FYI there is a local equivalent to |
9f06793 to
c322c66
Compare
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
This comment was marked as outdated.
c322c66 to
c42ce55
Compare
Does this explain the suspicious number of tracks in the CI output?
|
|
I was not expecting to get more tracks, no. 😦 If anything, I was fearing that we would get too few. 😕 Something for tomorrow to debug then... |
What I'm saying is that 3a7dc91 might already resolve that issue! |
|
Ahh... Could you re-run the tests? It should indeed fix that issue. 🤔 |
While modifying cuda::kernels::gather_best_tips_per_measurement to make use of that common function.
While modifying cuda::kernels::gather_measurement_votes to make use of that common function.
While modifying cuda::kernels::update_tip_length_buffer to make use of that common function, and updating device::build_tracks to be able to collaborate with the slightly different data that update_tip_length_buffer is now producing.
c42ce55 to
adcc228
Compare
Performance summaryHere is a summary of the performance effects of this PR: GraphicalTabular
Important All metrics in this report are given as reciprocal throughput, not as wallclock runtime. Note This is an automated message produced upon the explicit request of a human being. |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Physics performance summaryHere is a summary of the physics performance effects of this PR. Command used: Seeding performanceTotal number of seeds went from 298341 to 298344 (+0.0%) Track finding performanceTotal number of found tracks went from 20042 to 20043 (+0.0%) Track fitting performanceSeeding to track finding relative performanceNote This is an automated message produced on the explicit request of a human being. |
|
Well, the number of tracks is okay at least... 😢 |
adcc228 to
98016e6
Compare
|
|
It turns out that I misunderstood the code yesterday. 😦 c42ce55 was actually introducing a bug, and not fixing a performance issue. With that removed, at least locally, I get back the expected efficiency. This was all a good exercise for trying out how I would make physics efficiency plots during local code development. 🤔 |
Physics performance summaryHere is a summary of the physics performance effects of this PR. Command used: Seeding performanceTotal number of seeds went from 298341 to 298341 (+0.0%) Track finding performanceTotal number of found tracks went from 20042 to 20044 (+0.0%) Track fitting performanceSeeding to track finding relative performanceNote This is an automated message produced on the explicit request of a human being. |






























































This is a first step in synchronizing the CKF algorithms. First just synchronizing the SYCL algorithm with the CUDA one.
I had to turn 3 CUDA-only kernels into generic device code. Introducing the following new functions:
traccc::device::gather_best_tips_per_measurement: Since the kernel's payload technically depends on the used algebra type, I made it into a template function. The logic of the code didn't really change.traccc::device::gather_measurement_votes: Here it's really a quite literal transcription oftraccc::cuda::kernels::gather_measurement_votes.traccc::device::update_tip_length_buffer: This last function however is a bit different from the currenttraccc::cuda::kernels::update_tip_length_bufferkernel. Instead of managing a "resizable vector buffer" by hand, I rather introduced a resizablevecmem::data::vector_bufferinto the code, and modified the code to usevecmem::device_vector<T>::push_back(...)instead of managing a buffer and a size variable by hand.This latter change also required a small change in
vecmem::device::build_tracks. But that was really just a small change. (And since that kernel changed, I also had to tweak the Alpaka CKF in a single line to keep it compiling.)My first local tests were successful. Though I'll do some further tests as well. But as long as the CI doesn't complain, I'd ask you @stephenswat to launch the usual CI performance tests as well.
Let me also tag @flg. I believe with this PR included we'll be able to run some reasonable full chain tests on our AMD devices. 🤞