Algorithm Harmonization #5.0 SYCL CKF, main branch (2026.01.23.) by krasznaa · Pull Request #1240 · acts-project/traccc

krasznaa · 2026-01-23T16:50:38Z

This is a first step in synchronizing the CKF algorithms. First just synchronizing the SYCL algorithm with the CUDA one.

I had to turn 3 CUDA-only kernels into generic device code. Introducing the following new functions:

traccc::device::gather_best_tips_per_measurement: Since the kernel's payload technically depends on the used algebra type, I made it into a template function. The logic of the code didn't really change.
traccc::device::gather_measurement_votes: Here it's really a quite literal transcription of traccc::cuda::kernels::gather_measurement_votes .
traccc::device::update_tip_length_buffer: This last function however is a bit different from the current traccc::cuda::kernels::update_tip_length_buffer kernel. Instead of managing a "resizable vector buffer" by hand, I rather introduced a resizable vecmem::data::vector_buffer into the code, and modified the code to use vecmem::device_vector<T>::push_back(...) instead of managing a buffer and a size variable by hand.

This latter change also required a small change in vecmem::device::build_tracks. But that was really just a small change. (And since that kernel changed, I also had to tweak the Alpaka CKF in a single line to keep it compiling.)

My first local tests were successful. Though I'll do some further tests as well. But as long as the CI doesn't complain, I'd ask you @stephenswat to launch the usual CI performance tests as well.

Let me also tag @flg. I believe with this PR included we'll be able to run some reasonable full chain tests on our AMD devices. 🤞

.github/copilot-instructions.md

stephenswat · 2026-01-23T17:13:34Z

My first local tests were successful. Though I'll do some further tests as well. But as long as the CI doesn't complain, I'd ask you @stephenswat to launch the usual CI performance tests as well.

These are running!

🤦 I knew that at one point I'd mistakenly add this god-forsaken file... 😭

FYI there is a local equivalent to .gitignore in every git repository in .git/info/exclude, so you could just put this file there.

krasznaa · 2026-01-28T19:24:12Z

@flg, hopefully 3a7dc91 will make the CUDA and SYCL codes be more on par with each other. Though I didn't do any serious performance tests.

The c42ce55 update should help the CUDA and SYCL codes both by a tiny amount. Maybe it will be measurable, to be seen.

stephenswat · 2026-01-28T20:00:01Z

@flg, hopefully 3a7dc91 will make the CUDA and SYCL codes be more on par with each other. Though I didn't do any serious performance tests.

The c42ce55 update should help the CUDA and SYCL codes both by a tiny amount. Maybe it will be measurable, to be seen.

Does this explain the suspicious number of tracks in the CI output?

Total number of found tracks went from 20044 to 50259 (+150.7%)

krasznaa · 2026-01-28T20:16:48Z

I was not expecting to get more tracks, no. 😦 If anything, I was fearing that we would get too few. 😕

Something for tomorrow to debug then...

stephenswat · 2026-01-28T20:18:39Z

I was not expecting to get more tracks, no. 😦 If anything, I was fearing that we would get too few. 😕

Something for tomorrow to debug then...

What I'm saying is that 3a7dc91 might already resolve that issue!

krasznaa · 2026-01-28T20:36:22Z

Ahh... Could you re-run the tests? It should indeed fix that issue. 🤔

While modifying cuda::kernels::gather_best_tips_per_measurement to make use of that common function.

While modifying cuda::kernels::gather_measurement_votes to make use of that common function.

While modifying cuda::kernels::update_tip_length_buffer to make use of that common function, and updating device::build_tracks to be able to collaborate with the slightly different data that update_tip_length_buffer is now producing.

…be used.

stephenswat · 2026-01-29T09:34:55Z

Performance summary

Here is a summary of the performance effects of this PR:

Graphical

Tabular

Kernel	Reciprocal Throughput			Parallelism
Kernel	`4cec30b`	`adcc228`	Delta	`4cec30b`	`adcc228`
`propagate_to_next_surface`	7.80 ms	7.80 ms	-0.0%	3.46	3.46
`find_tracks`	1.72 ms	1.72 ms	-0.4%	1.84	1.84
`ccl_kernel`	827.56 μs	826.03 μs	-0.2%	1.37	1.37
`count_doublets`	819.31 μs	822.42 μs	0.4%	1.61	1.61
`count_triplets`	567.51 μs	568.74 μs	0.2%	1.02	1.02
`find_doublets`	535.50 μs	544.70 μs	1.7%	3.08	3.08
`Thrust::sort`	379.90 μs	379.57 μs	-0.1%	7.32	7.32
`find_triplets`	169.87 μs	169.88 μs	0.0%	1.31	1.32
`build_tracks`	125.11 μs	124.60 μs	-0.4%	3.72	3.71
`select_seeds`	53.81 μs	53.66 μs	-0.3%	1.34	1.34
`populate_grid`	24.00 μs	23.97 μs	-0.1%	1.22	1.22
`remove_duplicates`	23.23 μs	23.24 μs	0.1%	26.33	26.33
`count_grid_capacities`	22.14 μs	22.14 μs	-0.0%	1.22	1.22
`fill_sorted_measurements`	19.69 μs	19.76 μs	0.3%	1.13	1.13
`update_triplet_weights`	14.77 μs	14.78 μs	0.1%	1.27	1.27
`apply_interaction`	13.88 μs	13.87 μs	-0.1%	6.71	6.71
`estimate_track_params`	11.76 μs	11.80 μs	0.4%	2.69	2.69
`fill_finding_propagation_sort_keys`	8.80 μs	8.80 μs	-0.0%	7.67	7.67
`form_spacepoints`	8.31 μs	8.32 μs	0.2%	1.48	1.48
`reduce_triplet_counts`	5.63 μs	5.68 μs	0.9%	3.08	3.08
`unknown`	5.08 μs	5.08 μs	0.0%	4.26	4.26
`fill_finding_duplicate_removal_sort_keys`	1.57 μs	1.57 μs	-0.2%	38.03	38.11
Total	13.16 ms	13.17 ms	0.0%	2.99	2.99

Important

All metrics in this report are given as reciprocal throughput, not as wallclock runtime.

Note

This is an automated message produced upon the explicit request of a human being.

stephenswat · 2026-01-29T10:10:48Z

Physics performance summary

Here is a summary of the physics performance effects of this PR. Command used:

traccc_seeding_example_cuda --input-directory=/data/Acts/odd-simulations-20240506/geant4_ttbar_mu200 --digitization-file=geometries/odd/odd-digi-geometric-config.json --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --material-file=geometries/odd/odd-detray_material_detray.json --input-events=10 --use-acts-geom-source=on --check-performance --truth-finding-min-track-candidates=5 --truth-finding-min-pt=1.0 --truth-finding-min-z=-150 --truth-finding-max-z=150 --truth-finding-max-r=10 --seed-matching-ratio=0.99 --track-matching-ratio=0.5 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150 --max-num-tracks-per-measurement=1

Seeding performance

Total number of seeds went from 298341 to 298344 (+0.0%)

Seeding plots

Track finding performance

Total number of found tracks went from 20042 to 20043 (+0.0%)

Finding plots

Track fitting performance

Fitting plots

Seeding to track finding relative performance

Seeding to track finding plots

Note

This is an automated message produced on the explicit request of a human being.

krasznaa · 2026-01-29T10:16:26Z

Well, the number of tracks is okay at least... 😢

sonarqubecloud · 2026-01-29T12:37:22Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

krasznaa · 2026-01-29T12:39:05Z

It turns out that I misunderstood the code yesterday. 😦 c42ce55 was actually introducing a bug, and not fixing a performance issue.

With that removed, at least locally, I get back the expected efficiency. This was all a good exercise for trying out how I would make physics efficiency plots during local code development. 🤔

stephenswat · 2026-01-29T13:07:53Z

Physics performance summary

Here is a summary of the physics performance effects of this PR. Command used:

traccc_seeding_example_cuda --input-directory=/data/Acts/odd-simulations-20240506/geant4_ttbar_mu200 --digitization-file=geometries/odd/odd-digi-geometric-config.json --detector-file=geometries/odd/odd-detray_geometry_detray.json --grid-file=geometries/odd/odd-detray_surface_grids_detray.json --material-file=geometries/odd/odd-detray_material_detray.json --input-events=10 --use-acts-geom-source=on --check-performance --truth-finding-min-track-candidates=5 --truth-finding-min-pt=1.0 --truth-finding-min-z=-150 --truth-finding-max-z=150 --truth-finding-max-r=10 --seed-matching-ratio=0.99 --track-matching-ratio=0.5 --track-candidates-range=5:100 --seedfinder-vertex-range=-150:150 --max-num-tracks-per-measurement=1

Seeding performance

Total number of seeds went from 298341 to 298341 (+0.0%)

Seeding plots

Track finding performance

Total number of found tracks went from 20042 to 20044 (+0.0%)

Finding plots

Track fitting performance

Fitting plots

Seeding to track finding relative performance

Seeding to track finding plots

Note

This is an automated message produced on the explicit request of a human being.

stephenswat

Plots look good!

krasznaa requested a review from stephenswat January 23, 2026 16:50

krasznaa added cleanup Makes the code all clean and tidy cuda Changes related to CUDA sycl Changes related to SYCL labels Jan 23, 2026

krasznaa commented Jan 23, 2026

View reviewed changes

.github/copilot-instructions.md Outdated Show resolved Hide resolved

krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from f5506f0 to 0379cea Compare January 23, 2026 16:54

krasznaa force-pushed the SYCLCSKSync-main-20260123 branch 2 times, most recently from 9f06793 to c322c66 Compare January 26, 2026 12:44

This comment was marked as outdated.

Sign in to view

krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from c322c66 to c42ce55 Compare January 28, 2026 19:21

krasznaa added 6 commits January 29, 2026 09:19

Introduced device::gather_best_tips_per_measurement.

2d1170d

While modifying cuda::kernels::gather_best_tips_per_measurement to make use of that common function.

Introduced device::gather_measurement_votes.

00f178f

While modifying cuda::kernels::gather_measurement_votes to make use of that common function.

Introduced device::update_tip_length_buffer.

89d795a

While modifying cuda::kernels::update_tip_length_buffer to make use of that common function, and updating device::build_tracks to be able to collaborate with the slightly different data that update_tip_length_buffer is now producing.

Synchronized the SYCL CKF algorithm with the CUDA one.

c37b89d

Fix up the CUDA kernel signatures.

ec0c653

Fixed a bug in how the updated update_tip_length_buffer output would …

98016e6

…be used.

krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from c42ce55 to adcc228 Compare January 29, 2026 08:19

krasznaa force-pushed the SYCLCSKSync-main-20260123 branch from adcc228 to 98016e6 Compare January 29, 2026 12:36

stephenswat approved these changes Jan 29, 2026

View reviewed changes

stephenswat enabled auto-merge (squash) January 29, 2026 13:08

stephenswat merged commit b9e42f3 into acts-project:main Jan 29, 2026
27 of 43 checks passed

krasznaa deleted the SYCLCSKSync-main-20260123 branch January 30, 2026 08:34

krasznaa mentioned this pull request Feb 12, 2026

Algorithm Harmonization #5.1 CKF, main branch (2026.02.12.) #1259

Open

Comments

Conversation

krasznaa commented Jan 23, 2026

Uh oh!

Uh oh!

stephenswat commented Jan 23, 2026

Uh oh!

This comment was marked as outdated.

This comment was marked as outdated.

This comment was marked as outdated.

krasznaa commented Jan 28, 2026

Uh oh!

stephenswat commented Jan 28, 2026

Uh oh!

krasznaa commented Jan 28, 2026

Uh oh!

stephenswat commented Jan 28, 2026

Uh oh!

krasznaa commented Jan 28, 2026

Uh oh!

stephenswat commented Jan 29, 2026

Performance summary

Graphical

Tabular

Uh oh!

stephenswat commented Jan 29, 2026

Physics performance summary

Seeding performance

Track finding performance

Track fitting performance

Seeding to track finding relative performance

Uh oh!

krasznaa commented Jan 29, 2026

Uh oh!

sonarqubecloud bot commented Jan 29, 2026

Quality Gate passed

Uh oh!

krasznaa commented Jan 29, 2026

Uh oh!

stephenswat commented Jan 29, 2026

Physics performance summary

Seeding performance

Track finding performance

Track fitting performance

Seeding to track finding relative performance

Uh oh!

stephenswat left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants