Skip to content

Migrate LST inputs to SoA collections #47793

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

ariostas
Copy link
Contributor

@ariostas ariostas commented Apr 4, 2025

This PR refactors the LST inputs so that Portable SoA Collections are used from the very beginning. It also deduplicates some code since there were separate functions for standalone and CMSSW that played the same role.

It's mostly done, but we have a couple of questions (probably for @fwyzard or @makortel).

  1. Running on CPU works fine, but I'm still having issues with GPUs. The issue is with LCG dictionaries, as I described in my Mattermost message. Initially, I thought that the ROOT name normalization issue that Matti mentioned could be the cause, but I think it's something else.
  2. I had to split an SoA collection into two, since part of it is created before LST and some needs to be modified during the algorithm (to add some extra data). Do you know of a good way to combine the two collections into some sort of combined view, or do we just need to keep using the two separately. (Obviously, we could just define some new struct or class, but that doesn't sound great to me)

c.c. @slava77 @VourMa

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2025

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47793/44377

  • Found files with invalid states:
    • RecoTracker/LSTCore/interface/LSTInput.h:
    • RecoTracker/LSTCore/interface/alpaka/HitsRangesDeviceCollection.h:
    • RecoTracker/LSTCore/interface/HitsRangesHostCollection.h:
    • RecoTracker/LSTCore/interface/alpaka/LSTInputDeviceCollection.h:

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47793/44378

  • Found files with invalid states:
    • RecoTracker/LSTCore/interface/LSTInput.h:
    • RecoTracker/LSTCore/interface/alpaka/HitsRangesDeviceCollection.h:
    • RecoTracker/LSTCore/interface/HitsRangesHostCollection.h:
    • RecoTracker/LSTCore/interface/alpaka/LSTInputDeviceCollection.h:

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 4, 2025

A new Pull Request was created by @ariostas for master.

It involves the following packages:

  • DQM/TrackingMonitorSource (dqm)
  • HLTrigger/Configuration (hlt)
  • RecoTracker/IterativeTracking (reconstruction)
  • RecoTracker/LST (reconstruction)
  • RecoTracker/LSTCore (reconstruction)

@Martin-Grunewald, @antoniovagnerini, @cmsbuild, @jfernan2, @mandrenguyen, @mmusich, @rseidita can you please review it and eventually sign? Thanks.
@GiacomoSguazzoni, @Martin-Grunewald, @SohamBhattacharya, @VinInn, @VourMa, @arossi83, @dgulhan, @felicepantaleo, @fioriNTU, @gpetruc, @idebruyn, @jandrea, @missirol, @mmusich, @mtosi, @richa2710, @rovere, @sroychow, @threus this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@ariostas
Copy link
Contributor Author

ariostas commented Apr 4, 2025

Hmm I see that now there's #47697 and #47774, so these are all likely related issues

@slava77
Copy link
Contributor

slava77 commented Apr 7, 2025

test parameters:

  • enable_tests = gpu
  • workflows_gpu = 29634.704,29834.704
  • workflows = 29634.703,29834.703,29834.755
  • relvals_opt = -w upgrade,standard
  • relvals_opt_gpu = -w upgrade,standard

@slava77
Copy link
Contributor

slava77 commented Apr 7, 2025

@cmsbuild please test

@VourMa
Copy link
Contributor

VourMa commented Apr 7, 2025

@slava77 I would propose we start testing also the .755 workflow to make sure that LST in Phase 2 HLT works fine.
In this particular case, I expect that workflow to crash in more than one ways, which I can follow up on and resolve once the rest of the issues are fixed.

@slava77
Copy link
Contributor

slava77 commented Apr 7, 2025

@slava77 I would propose we start testing also the .755 workflow to make sure that LST in Phase 2 HLT works fine. In this particular case, I expect that workflow to crash in more than one ways, which I can follow up on and resolve once the rest of the issues are fixed.

I don't see an accelerator specifier in this workflow.
Is it expected to be added to the CPU or GPU list?

@VourMa
Copy link
Contributor

VourMa commented Apr 7, 2025

Is it expected to be added to the CPU or GPU list?

It should be in the "workflows" (not the "workflows_gpu") list.

@cmsbuild
Copy link
Contributor

cmsbuild commented Apr 7, 2025

-1

Failed Tests: RelVals-CUDA RelVals-ROCM
Size: This PR adds an extra 20KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-a5bdf1/45418/summary.html
COMMIT: 650b643
CMSSW: CMSSW_15_1_X_2025-04-07-1100/el8_amd64_gcc12
Additional Tests: CUDA,ROCM
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47793/45418/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 6 lines to the logs
  • Reco comparison results: 323 differences found in the comparisons
  • DQMHistoTests: Total files compared: 54
  • DQMHistoTests: Total histograms compared: 4206802
  • DQMHistoTests: Total failures: 7529
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4199253
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: -0.316 KiB( 53 files compared)
  • DQMHistoSizes: changed ( 29634.703,... ): -0.158 KiB Tracking/TrackParameters
  • Checked 231 log files, 200 edm output root files, 54 DQM output files
  • TriggerResults: no differences found

@cmsbuild
Copy link
Contributor

Pull request #47793 was updated. @Martin-Grunewald, @antoniovagnerini, @cmsbuild, @jfernan2, @mandrenguyen, @mmusich, @rseidita can you please check and sign again.

@jfernan2
Copy link
Contributor

assign heterogeneous

@cmsbuild
Copy link
Contributor

New categories assigned: heterogeneous

@fwyzard,@makortel you have been requested to review this Pull request/Issue and eventually sign? Thanks

@slava77
Copy link
Contributor

slava77 commented Apr 10, 2025

@cmsbuild please test

@cmsbuild
Copy link
Contributor

@cmsbuild
Copy link
Contributor

Pull request #47793 was updated. @Martin-Grunewald, @antoniovagnerini, @cmsbuild, @fwyzard, @jfernan2, @makortel, @mandrenguyen, @mmusich, @rseidita can you please check and sign again.

@ariostas
Copy link
Contributor Author

There's still an issue I'm tracking down, so I'll mark this as a draft until it's ready.

@ariostas ariostas marked this pull request as draft April 15, 2025 15:48
@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47793/44535

@cmsbuild
Copy link
Contributor

Pull request #47793 was updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants