Skip to content

Comments

Algorithm Harmonization #5.0 SYCL CKF, main branch (2026.01.23.)#1240

Merged
stephenswat merged 6 commits intoacts-project:mainfrom
krasznaa:SYCLCSKSync-main-20260123
Jan 29, 2026
Merged

Algorithm Harmonization #5.0 SYCL CKF, main branch (2026.01.23.)#1240
stephenswat merged 6 commits intoacts-project:mainfrom
krasznaa:SYCLCSKSync-main-20260123

Conversation

@krasznaa
Copy link
Member

This is a first step in synchronizing the CKF algorithms. First just synchronizing the SYCL algorithm with the CUDA one.

I had to turn 3 CUDA-only kernels into generic device code. Introducing the following new functions:

  • traccc::device::gather_best_tips_per_measurement: Since the kernel's payload technically depends on the used algebra type, I made it into a template function. The logic of the code didn't really change.
  • traccc::device::gather_measurement_votes: Here it's really a quite literal transcription of traccc::cuda::kernels::gather_measurement_votes .
  • traccc::device::update_tip_length_buffer: This last function however is a bit different from the current traccc::cuda::kernels::update_tip_length_buffer kernel. Instead of managing a "resizable vector buffer" by hand, I rather introduced a resizable vecmem::data::vector_buffer into the code, and modified the code to use vecmem::device_vector<T>::push_back(...) instead of managing a buffer and a size variable by hand.

This latter change also required a small change in vecmem::device::build_tracks. But that was really just a small change. (And since that kernel changed, I also had to tweak the Alpaka CKF in a single line to keep it compiling.)

My first local tests were successful. Though I'll do some further tests as well. But as long as the CI doesn't complain, I'd ask you @stephenswat to launch the usual CI performance tests as well.

Let me also tag @flg. I believe with this PR included we'll be able to run some reasonable full chain tests on our AMD devices. 🤞

@krasznaa krasznaa requested a review from stephenswat January 23, 2026 16:50
@krasznaa krasznaa added cleanup Makes the code all clean and tidy cuda Changes related to CUDA sycl Changes related to SYCL labels Jan 23, 2026
@krasznaa krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from f5506f0 to 0379cea Compare January 23, 2026 16:54
@stephenswat
Copy link
Member

My first local tests were successful. Though I'll do some further tests as well. But as long as the CI doesn't complain, I'd ask you @stephenswat to launch the usual CI performance tests as well.

These are running!

🤦 I knew that at one point I'd mistakenly add this god-forsaken file... 😭

FYI there is a local equivalent to .gitignore in every git repository in .git/info/exclude, so you could just put this file there.

@krasznaa krasznaa force-pushed the SYCLCSKSync-main-20260123 branch 2 times, most recently from 9f06793 to c322c66 Compare January 26, 2026 12:44
@stephenswat

This comment was marked as outdated.

@stephenswat

This comment was marked as outdated.

@stephenswat

This comment was marked as outdated.

@krasznaa krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from c322c66 to c42ce55 Compare January 28, 2026 19:21
@krasznaa
Copy link
Member Author

@flg, hopefully 3a7dc91 will make the CUDA and SYCL codes be more on par with each other. Though I didn't do any serious performance tests.

The c42ce55 update should help the CUDA and SYCL codes both by a tiny amount. Maybe it will be measurable, to be seen.

@stephenswat
Copy link
Member

@flg, hopefully 3a7dc91 will make the CUDA and SYCL codes be more on par with each other. Though I didn't do any serious performance tests.

The c42ce55 update should help the CUDA and SYCL codes both by a tiny amount. Maybe it will be measurable, to be seen.

Does this explain the suspicious number of tracks in the CI output?

Total number of found tracks went from 20044 to 50259 (+150.7%)

@krasznaa
Copy link
Member Author

I was not expecting to get more tracks, no. 😦 If anything, I was fearing that we would get too few. 😕

Something for tomorrow to debug then...

@stephenswat
Copy link
Member

I was not expecting to get more tracks, no. 😦 If anything, I was fearing that we would get too few. 😕

Something for tomorrow to debug then...

What I'm saying is that 3a7dc91 might already resolve that issue!

@krasznaa
Copy link
Member Author

Ahh... Could you re-run the tests? It should indeed fix that issue. 🤔

While modifying cuda::kernels::gather_best_tips_per_measurement
to make use of that common function.
While modifying cuda::kernels::gather_measurement_votes
to make use of that common function.
While modifying cuda::kernels::update_tip_length_buffer
to make use of that common function, and updating
device::build_tracks to be able to collaborate with the
slightly different data that update_tip_length_buffer is
now producing.
@krasznaa krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from c42ce55 to adcc228 Compare January 29, 2026 08:19
@stephenswat
Copy link
Member

Performance summary

Here is a summary of the performance effects of this PR:

Graphical

Tabular

KernelReciprocal ThroughputParallelism
4cec30badcc228Delta4cec30badcc228
propagate_to_next_surface7.80 ms7.80 ms-0.0%3.463.46
find_tracks1.72 ms1.72 ms-0.4%1.841.84
ccl_kernel827.56 μs826.03 μs-0.2%1.371.37
count_doublets819.31 μs822.42 μs0.4%1.611.61
count_triplets567.51 μs568.74 μs0.2%1.021.02
find_doublets535.50 μs544.70 μs1.7%3.083.08
Thrust::sort379.90 μs379.57 μs-0.1%7.327.32
find_triplets169.87 μs169.88 μs0.0%1.311.32
build_tracks125.11 μs124.60 μs-0.4%3.723.71
select_seeds53.81 μs53.66 μs-0.3%1.341.34
populate_grid24.00 μs23.97 μs-0.1%1.221.22
remove_duplicates23.23 μs23.24 μs0.1%26.3326.33
count_grid_capacities22.14 μs22.14 μs-0.0%1.221.22
fill_sorted_measurements19.69 μs19.76 μs0.3%1.131.13
update_triplet_weights14.77 μs14.78 μs0.1%1.271.27
apply_interaction13.88 μs13.87 μs-0.1%6.716.71
estimate_track_params11.76 μs11.80 μs0.4%2.692.69
fill_finding_propagation_sort_keys8.80 μs8.80 μs-0.0%7.677.67
form_spacepoints8.31 μs8.32 μs0.2%1.481.48
reduce_triplet_counts5.63 μs5.68 μs0.9%3.083.08
unknown5.08 μs5.08 μs0.0%4.264.26
fill_finding_duplicate_removal_sort_keys1.57 μs1.57 μs-0.2%38.0338.11
Total13.16 ms13.17 ms0.0%2.992.99

Important

All metrics in this report are given as reciprocal throughput, not as wallclock runtime.

Note

This is an automated message produced upon the explicit request of a human being.

@stephenswat
Copy link
Member

Physics performance summary

Here is a summary of the physics performance effects of this PR. Command used:

traccc_seeding_example_cuda --input-directory=/data/Acts/odd-simulations-20240506/geant4_ttbar_mu200 --digitization-file=geometries/odd/odd-digi-geometric-config.json --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --material-file=geometries/odd/odd-detray_material_detray.json --input-events=10 --use-acts-geom-source=on --check-performance --truth-finding-min-track-candidates=5 --truth-finding-min-pt=1.0 --truth-finding-min-z=-150 --truth-finding-max-z=150 --truth-finding-max-r=10 --seed-matching-ratio=0.99 --track-matching-ratio=0.5 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150 --max-num-tracks-per-measurement=1

Seeding performance

Total number of seeds went from 298341 to 298344 (+0.0%)

Seeding plots



Track finding performance

Total number of found tracks went from 20042 to 20043 (+0.0%)

Finding plots









Track fitting performance

Fitting plots














Seeding to track finding relative performance

Seeding to track finding plots



Note

This is an automated message produced on the explicit request of a human being.

@krasznaa
Copy link
Member Author

Well, the number of tracks is okay at least... 😢

@krasznaa krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from adcc228 to 98016e6 Compare January 29, 2026 12:36
@sonarqubecloud
Copy link

@krasznaa
Copy link
Member Author

It turns out that I misunderstood the code yesterday. 😦 c42ce55 was actually introducing a bug, and not fixing a performance issue.

With that removed, at least locally, I get back the expected efficiency. This was all a good exercise for trying out how I would make physics efficiency plots during local code development. 🤔

@stephenswat
Copy link
Member

Physics performance summary

Here is a summary of the physics performance effects of this PR. Command used:

traccc_seeding_example_cuda --input-directory=/data/Acts/odd-simulations-20240506/geant4_ttbar_mu200 --digitization-file=geometries/odd/odd-digi-geometric-config.json --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --material-file=geometries/odd/odd-detray_material_detray.json --input-events=10 --use-acts-geom-source=on --check-performance --truth-finding-min-track-candidates=5 --truth-finding-min-pt=1.0 --truth-finding-min-z=-150 --truth-finding-max-z=150 --truth-finding-max-r=10 --seed-matching-ratio=0.99 --track-matching-ratio=0.5 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150 --max-num-tracks-per-measurement=1

Seeding performance

Total number of seeds went from 298341 to 298341 (+0.0%)

Seeding plots



Track finding performance

Total number of found tracks went from 20042 to 20044 (+0.0%)

Finding plots









Track fitting performance

Fitting plots














Seeding to track finding relative performance

Seeding to track finding plots



Note

This is an automated message produced on the explicit request of a human being.

Copy link
Member

@stephenswat stephenswat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Plots look good!

@stephenswat stephenswat enabled auto-merge (squash) January 29, 2026 13:08
@stephenswat stephenswat merged commit b9e42f3 into acts-project:main Jan 29, 2026
27 of 43 checks passed
@krasznaa krasznaa deleted the SYCLCSKSync-main-20260123 branch January 30, 2026 08:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cleanup Makes the code all clean and tidy cuda Changes related to CUDA sycl Changes related to SYCL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants