Skip to content

Keep states ordered by quality in CKF #929

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 23, 2025

Conversation

stephenswat
Copy link
Member

This commit allows the find_tracks kernel in the CKF to keep the best measurements instead of keeping random ones, which will hopefully allow us to significantly reduce the branching factor of the CKF. The logic here works by employing a series of mutexes, allowing threads to lock a critical section and insert their own measurement, overwriting the previous ones.

@stephenswat stephenswat added improvement Improve an existing feature shared Changes related to shared code labels Mar 28, 2025
@stephenswat
Copy link
Member Author

stephenswat commented Mar 28, 2025

To-do list:

  • Update the SYCL CKF.
  • Update the Alpaka CKF.
  • Update to use vecmem CAS.
  • Performance measurements.
  • Physics performance measurements.

Tagging @paradajzblond, @beomki-yeo, @niermann999, @asalzburger as interested parties.

@stephenswat stephenswat force-pushed the feat/keep_best_meas branch from 3dd7fe8 to f5c1115 Compare April 16, 2025 12:13
@stephenswat
Copy link
Member Author

This is now ready for review.

@stephenswat stephenswat marked this pull request as ready for review April 16, 2025 12:13
@stephenswat stephenswat requested a review from beomki-yeo April 16, 2025 12:13
@stephenswat stephenswat force-pushed the feat/keep_best_meas branch from f5c1115 to c83e7ab Compare April 16, 2025 12:20
@stephenswat
Copy link
Member Author

Performance summary

Here is a summary of the performance effects of this PR:

Graphical

Tabular

Kernel 1edca0f c83e7ab Delta
fit 280.68 ms 281.27 ms 0.2%
propagate_to_next_surface 118.21 ms 117.46 ms -0.6%
find_tracks 26.36 ms 27.44 ms 4.1%
count_triplets 14.16 ms 14.16 ms 0.0%
find_triplets 5.98 ms 5.99 ms 0.1%
build_tracks 1.07 ms 1.07 ms 0.5%
find_doublets 901.37 μs 890.88 μs -1.2%
Thrust::sort 777.50 μs 776.94 μs -0.1%
prune_tracks 672.26 μs 685.92 μs 2.0%
ccl_kernel 680.91 μs 681.78 μs 0.1%
count_doublets 628.16 μs 626.30 μs -0.3%
select_seeds 354.61 μs 360.48 μs 1.7%
apply_interaction 132.61 μs 127.37 μs -3.9%
update_triplet_weights 96.29 μs 96.25 μs -0.0%
fill_sort_keys 68.20 μs 67.73 μs -0.7%
estimate_track_params 34.15 μs 34.11 μs -0.1%
populate_grid 30.41 μs 30.42 μs 0.0%
count_grid_capacities 29.24 μs 29.21 μs -0.1%
form_spacepoints 13.49 μs 13.48 μs -0.1%
reduce_triplet_counts 6.65 μs 6.66 μs 0.1%
static_kernel 1.76 μs 1.75 μs -0.5%
make_barcode_sequence 988.79 ns 991.71 ns 0.3%
fill_prefix_sum 165.49 ns 165.50 ns 0.0%
Total 450.89 ms 451.82 ms 0.2%

Important

All metrics in this report are given as reciprocal throughput, not as wallclock runtime.

Note

This is an automated message produced on the explicit request of a human being.

@stephenswat
Copy link
Member Author

Tagging @CrossR and @StewMH as this affects the Alpaka code.

@stephenswat
Copy link
Member Author

Nice, we're back to SYCL crashing on our code:

lld: error: <unknown>:0:0: in function _ZTSN6traccc4sycl7details7kernels11find_tracksINS0_7kernels37ckf_constant_field_telescope_detectorEEE : invalid addrspacecast
lld: error: <unknown>:0:0: in function _ZTSN6traccc4sycl7details7kernels11find_tracksINS0_7kernels37ckf_constant_field_telescope_detectorEEE_with_offset : invalid addrspacecast
llvm-foreach: 

@stephenswat stephenswat force-pushed the feat/keep_best_meas branch from c83e7ab to 7c2d503 Compare April 29, 2025 10:08
@stephenswat
Copy link
Member Author

Okay, the problem has been resolved. 👍

Copy link
Contributor

@beomki-yeo beomki-yeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@stephenswat
Copy link
Member Author

@fiedlerp What's your timeline for #933? I'd like to get this in quickly but if you can get #933 done we can merge that first to save you the headache of rebasing it.

@stephenswat stephenswat force-pushed the feat/keep_best_meas branch 2 times, most recently from ae6c786 to 15e5731 Compare May 23, 2025 08:56
@stephenswat
Copy link
Member Author

Before

Truth finding before this PR at $\mu = 80$:

image

Truth finding before this PR at $\mu = 200$:

image

After

Truth finding after this PR at $\mu = 80$:

image

Truth finding after this PR at $\mu = 200$:

image

@stephenswat
Copy link
Member Author

Performance summary

Here is a summary of the performance effects of this PR:

Graphical

Tabular

Kernel 9a16463 15e5731 Delta
fit 594.97 ms 740.80 ms 24.5%
propagate_to_next_surface 111.56 ms 143.53 ms 28.7%
find_tracks 24.23 ms 28.01 ms 15.6%
count_triplets 14.16 ms 14.16 ms 0.0%
find_triplets 5.97 ms 5.98 ms 0.2%
find_doublets 887.59 μs 892.79 μs 0.6%
ccl_kernel 684.14 μs 684.99 μs 0.1%
count_doublets 627.08 μs 627.45 μs 0.1%
build_tracks 421.19 μs 506.09 μs 20.2%
Thrust::sort 448.15 μs 448.86 μs 0.2%
select_seeds 364.76 μs 366.74 μs 0.5%
DeviceRadixSortOnesweepKernel 183.33 μs 227.70 μs 24.2%
DeviceRadixSortHistogramKernel 113.65 μs 114.74 μs 1.0%
update_triplet_weights 100.32 μs 100.13 μs -0.2%
apply_interaction 77.26 μs 93.73 μs 21.3%
fill_sort_keys 58.96 μs 74.30 μs 26.0%
estimate_track_params 34.05 μs 34.22 μs 0.5%
populate_grid 30.46 μs 30.41 μs -0.2%
count_grid_capacities 29.15 μs 29.13 μs -0.0%
form_spacepoints 13.40 μs 13.37 μs -0.2%
reduce_triplet_counts 6.68 μs 6.68 μs -0.1%
static_kernel 1.80 μs 1.80 μs 0.0%
make_barcode_sequence 1.02 μs 1.03 μs 0.5%
DeviceRadixSortExclusiveSumKernel 511.85 ns 513.39 ns 0.3%
fill_prefix_sum 165.42 ns 165.40 ns -0.0%
Total 754.97 ms 936.74 ms 24.1%

Important

All metrics in this report are given as reciprocal throughput, not as wallclock runtime.

Warning

At least one kernel incurred a significant performance regression.

Note

This is an automated message produced on the explicit request of a human being.

@stephenswat
Copy link
Member Author

stephenswat commented May 23, 2025

Performance impact is expected due to the fact that we produce more high-quality tracks which require more fitting. Also the performance will improve when we reduce the branching factor.

@stephenswat stephenswat enabled auto-merge May 23, 2025 09:33
This commit allows the `find_tracks` kernel in the CKF to keep the best
measurements instead of keeping random ones, which will hopefully allow
us to significantly reduce the branching factor of the CKF. The logic
here works by employing a series of mutexes, allowing threads to lock a
critical section and insert their own measurement, overwriting the
previous ones.
@stephenswat stephenswat force-pushed the feat/keep_best_meas branch from 15e5731 to c12cbc4 Compare May 23, 2025 09:37
Copy link

@krasznaa
Copy link
Member

I'm very confused. I don't see any difference between the efficiency plots.

@stephenswat stephenswat merged commit 9c03044 into acts-project:main May 23, 2025
26 of 29 checks passed
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 23, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 23, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 23, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 23, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 26, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 26, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 27, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 27, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 27, 2025
This commit makes the track finding code more precisely respect the
per-seed branching factor limit, which was accidentally loosened in pull
request acts-project#929.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 27, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 28, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 28, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 28, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 28, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
stephenswat added a commit to stephenswat/traccc that referenced this pull request May 29, 2025
Following acts-project#929, we can now reduce the branching factor without incurring
a decrease in physics performance. This commit drops the branching
dramatically from 10 to 2.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improve an existing feature shared Changes related to shared code
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants