Skip to content

Do not store links which will be rejected #933

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

fiedlerp
Copy link
Contributor

No description provided.

@fiedlerp
Copy link
Contributor Author

Should I also make the changes to the CPU version?

@stephenswat
Copy link
Member

Could you please provide us a bit more information about what you are doing here? And how does this interact with #908?

No need to update the CPU code if the physics performance doesn't change.

@stephenswat
Copy link
Member

Having read through it I really like the changes. Can we un-draft this? I am currently running some performance tests, if those turn out alright I am happy to merge.

@stephenswat
Copy link
Member

Be aware that this change seems to significantly impact the compute performance.

933

My initial guess would be that this is due to the fact that we might somehow be keeping more tracks? This would explain the increase in fitting time, even if you didn't touch the fitting kernel at all.

@fiedlerp
Copy link
Contributor Author

I have more changes to come. Some edge cases, and also usage of track candidate lengths to condition the tips storage in find_tracks and propage_to_next_surface instead of filtering by it in build_tracks.

@beomki-yeo
Copy link
Contributor

beomki-yeo commented Mar 31, 2025

It will be likely that cuda tests that compare with cpu finding results might fail if the finding results change. Have you run traccc_test_cuda?

@stephenswat
Copy link
Member

It will be likely that cuda tests that compare with cpu finding results might fail if the finding results change. Have you run traccc_test_cuda?

Ah, well spotted! The Gitlab CI doesn't run for Petr.

@stephenswat
Copy link
Member

Confirmed the tests fail:

[ RUN      ] CUDACkfCombinatoricsTelescopeValidation/CudaCkfCombinatoricsTelescopeTests.Run/1
WARNING: No entries in volume finder

Detector check: OK
Using hit time
/mnt/ssd1/sswatman/traccc/tests/cuda/test_ckf_combinatorics_telescope.cpp:225: Failure
Expected equality of these values:
  track_candidates_cuda.size()
    Which is: 52488
  std::pow(n_truth_tracks, std::get<11>(GetParam()) + 1)
    Which is: 59049

Here's the number of tracks over these commits:

a9cf4fb:

Using CUDA device: NVIDIA RTX A5000 [id: 0, bus: 1, device: 0]
Warm-up processing [==================================================] 100% [00m:00s]
Event processing   [==================================================] 100% [00m:00s]
Reconstructed track parameters: 1102959
Time totals:
                  File reading  561 ms
            Warm-up processing  10038 ms
              Event processing  10002 ms
Throughput:
            Warm-up processing  1003.89 ms/event, 0.996124 events/s
              Event processing  1000.27 ms/event, 0.999735 events/s

dcb8d32:

Using CUDA device: NVIDIA RTX A5000 [id: 0, bus: 1, device: 0]
Warm-up processing [==================================================] 100% [00m:00s]
Event processing   [==================================================] 100% [00m:00s]
Reconstructed track parameters: 1102962
Time totals:
                  File reading  558 ms
            Warm-up processing  8985 ms
              Event processing  8935 ms
Throughput:
            Warm-up processing  898.504 ms/event, 1.11296 events/s
              Event processing  893.565 ms/event, 1.11911 events/s

348de1a (main):

Using CUDA device: NVIDIA RTX A5000 [id: 0, bus: 1, device: 0]
Warm-up processing [==================================================] 100% [00m:00s]
Event processing   [==================================================] 100% [00m:00s]
Reconstructed track parameters: 1034853
Time totals:
                  File reading  557 ms
            Warm-up processing  8677 ms
              Event processing  8613 ms
Throughput:
            Warm-up processing  867.758 ms/event, 1.15239 events/s
              Event processing  861.315 ms/event, 1.16102 events/s

@fiedlerp
Copy link
Contributor Author

fiedlerp commented Apr 1, 2025

I see, I will run it locally before the next push. However, I think that I found the discrepancy. It inserts holes when the branching is stopped due to reaching maximum candidates per seed. Also, the hole inserting does not apply this cut. This is one of the reasons why I did not want to undraft it yet.

@acts-project acts-project deleted a comment from paulgessinger Apr 1, 2025
@acts-project acts-project deleted a comment from paulgessinger Apr 1, 2025
@fiedlerp fiedlerp force-pushed the dead-tracks-filtering branch 2 times, most recently from 72e35db to 0613ce5 Compare April 11, 2025 19:41
@fiedlerp fiedlerp marked this pull request as ready for review April 11, 2025 19:41
@fiedlerp fiedlerp force-pushed the dead-tracks-filtering branch 3 times, most recently from 62d9ee4 to 974a329 Compare April 14, 2025 16:03
@stephenswat
Copy link
Member

Performance summary

Here is a summary of the performance effects of this PR:

Graphical

Tabular

Kernel e31cdb8 974a329 Delta
fit 280.01 ms 304.39 ms 8.7%
propagate_to_next_surface 118.09 ms 105.49 ms -10.7%
find_tracks 26.68 ms 23.49 ms -12.0%
count_triplets 14.15 ms 14.16 ms 0.0%
find_triplets 5.98 ms 5.99 ms 0.1%
find_doublets 890.52 μs 885.89 μs -0.5%
Thrust::sort 757.51 μs 720.54 μs -4.9%
prune_tracks 710.21 μs 709.90 μs -0.0%
ccl_kernel 682.66 μs 683.99 μs 0.2%
build_tracks 1.14 ms 666.85 μs -41.4%
count_doublets 625.69 μs 626.37 μs 0.1%
select_seeds 361.78 μs 359.79 μs -0.5%
apply_interaction 132.24 μs 122.67 μs -7.2%
update_triplet_weights 96.70 μs 95.85 μs -0.9%
fill_sort_keys 68.07 μs 58.06 μs -14.7%
estimate_track_params 34.34 μs 34.39 μs 0.1%
populate_grid 30.37 μs 30.47 μs 0.3%
count_grid_capacities 29.15 μs 29.13 μs -0.1%
form_spacepoints 14.19 μs 14.19 μs -0.0%
reduce_triplet_counts 6.67 μs 6.66 μs -0.2%
static_kernel 1.78 μs 1.78 μs 0.0%
make_barcode_sequence 987.35 ns 1.00 μs 1.8%
fill_prefix_sum 165.48 ns 165.42 ns -0.0%
Total 450.50 ms 458.57 ms 1.8%

Note

This is an automated message produced on the explicit request of a human being.

@stephenswat
Copy link
Member

I'd really like to understand the performance regression that we have going on in the fit kernel. Since you haven't touched the fitting kernel itself, it would seem to me that you are either fitting more tracks or that your tracks are somehow longer. Have you looked into this?

@fiedlerp
Copy link
Contributor Author

I have not looked into the fitting. I see that the increase is related to the max skipping. The number of tracks is the same, as proven by traccc_test_cuda. The tracks are shorter by one hole, as my change creates the tip instead of creating a candidate with too many holes.

@stephenswat
Copy link
Member

stephenswat commented Apr 15, 2025

I see the following change in the number of tracks being passed to the fitter:

Commit Tracks States States per track
e31cdb8 103,495 572,400 5.531
974a329 104,126 577,516 5.546

So the number of tracks increases by 0.6%, the number of track states increases by 0.9%, and the average number of states per track increases by 0.2%. By itself I don't think this is a problem (and I'm not sure why the fitting time increases by 8.7% while the number of track states increases by only 0.9%) but you claim that tracks should be shorter because you remove holes. 🤔

@stephenswat
Copy link
Member

stephenswat commented Apr 15, 2025

The difference in track counts does go away if we increase --max-num-branches-per-seed:

Before with --max-num-branches-per-seed=1000000
Fitting 117770 tracks with 630243 states

After with --max-num-branches-per-seed=1000000
Fitting 117784 tracks with 630450 states

@beomki-yeo what's your take on this? I think it is okay, and I think we might want to eventually get rid of the --max-num-branches-per-seed flag entirely as it is completely non-deterministic, you have no control over which tracks you keep and which you throw away.

@beomki-yeo
Copy link
Contributor

--max-num-branches-per-seed is (1) to apply a upper bound for the CKF output of very similar tracks which can explode exponentially and (2) to allocate the memory for tips. I am afraid that removing it may cause a bottleneck even though it won't happen very frequently.

It is correct that it is non-deterministic but we may need to find alternative if we want to get rid of it 🤔

@beomki-yeo
Copy link
Contributor

beomki-yeo commented Apr 15, 2025

@fiedlerp Could you write a description with some details so we can review the PR? Increased fitting time reported by Stephen is definitely weird to me

@stephenswat
Copy link
Member

It is correct that it is non-deterministic but we may need to find alternative if we want to get rid of it 🤔

Indeed this is a good point; ideally I'd like to replace it with some of deterministic mechanism that achieves the same thing... I'll think about it.

@fiedlerp
Copy link
Contributor Author

fiedlerp commented Apr 26, 2025

@fiedlerp Could you write a description with some details so we can review the PR? Increased fitting time reported by Stephen is definitely weird to me

The PR consists of 3 commits, each handling one of the conditions which kill or reject candidates.

The first commit moves the seed branching cut from propage_to_next_surface to find_tracks. This is done by counting the seed branches while building the candidates. When the seed branching limit is hit, the computation of candidates with the same branch is stopped as fast as possible to avoid wasting computation time.

The second commit moves the step skipping (hole counting) condition also from propage_to_next_surface to find_tracks. This is the part that makes the 10% speedup of the track finding. If the candidate has already reached the maximum number of holes, it is added to tips instead of creating a new hole.

The third commit adjusts the candidate length condition in propagate_to_next_surface to account for holes, so if the candidate is too short, it is not added to the tips. The same condition is also added to find_track's hole counting part. Also, when reaching the last step, the tip adding is moved from propage_to_next_surface to find_tracks, and the computation is stopped after the find_tracks kernel, as no propagation to the next surface is needed. Because tips contain only valid tracks, pruning is unnecessary, as all built tracks are valid.

Copy link
Contributor

@beomki-yeo beomki-yeo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The list of changes makes sense to me. Please resolve the @stephenswat 's comments and update the branch to merge the PR

@stephenswat
Copy link
Member

Please revert the change introducing the early returns, rebase and we can move towards merging this.

@fiedlerp fiedlerp force-pushed the dead-tracks-filtering branch 4 times, most recently from 86c3bda to 4054909 Compare May 3, 2025 21:29
@fiedlerp fiedlerp force-pushed the dead-tracks-filtering branch from 4054909 to 791fbfe Compare May 3, 2025 21:34
Copy link

sonarqubecloud bot commented May 3, 2025

@fiedlerp
Copy link
Contributor Author

fiedlerp commented May 3, 2025

The build fails on a detray-related issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants