-
Notifications
You must be signed in to change notification settings - Fork 4.4k
A More Flexible And Lightweight CA #47611
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cms-bot internal usage |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47611/44120
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47611/44121
|
A new Pull Request was created by @AdrianoDee for master. It involves the following packages:
@Dr15Jones, @Martin-Grunewald, @antoniovagnerini, @bsunanda, @civanch, @cmsbuild, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @mmusich, @rseidita can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
enable gpu |
please test |
process.hltCAGeometry = cms.ESProducer('CAGeometryESProducer@alpaka', | ||
caDCACuts = cms.vdouble( | ||
0.15, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25, 0.25 | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these (collections of) parameters explicitly listed here in the HLT customisation function, instead of using fillDescriptions for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since CAGeometryESProducer is a new plugin, fillDescriptions can take care of any/all parameters!
|
||
if not hasattr(prod, 'caGeometry'): | ||
setattr(prod, 'caGeometry', cms.string('hltCAGeometry')) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are these (collections of) parameters explicitly listed here in the HLT customisation function, instead of using fillDescriptions for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These plugins are already existing, so simply delete here in the customisation all old/removed parameters as well as all parameters with changed values, with fillDescriptions then taking care of new values and new parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, let me take care of these (and those above).
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47611/44387
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47611/44388
Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
f96dada
to
e5ae12f
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47611/44389
|
Pull request #47611 was updated. @Dr15Jones, @Martin-Grunewald, @antoniovagnerini, @bsunanda, @civanch, @cmsbuild, @fwyzard, @jfernan2, @kpedro88, @makortel, @mandrenguyen, @mdhildreth, @mmusich, @rseidita can you please check and sign again. |
please test |
@@ -9,3 +9,6 @@ | |||
<use name="CalibTracker/Records"/> | |||
<use name="clhep"/> | |||
<use name="boost"/> | |||
<use name="alpaka"/> | |||
<flags ALPAKA_BACKENDS="1"/> | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RecoTracker/Record
package seems to be intended for EventSetup Record classes. I'd suggest to keep it that way, i.e. to move the new SoA definitions elsewhere.
-1 Failed Tests: UnitTests RelVals RelVals-CUDA RelVals-INPUT RelVals-ROCM AddOn cudaUnitTests rocmUnitTests Unit TestsI found 1 errors in the following unit tests: ---> test alpakaTestPrefixScanSerialSync had ERRORS AddOn Tests
CUDA Unit TestsI found 3 errors in the following unit tests: ---> test deviceVertexFinderByDensity_tCudaAsync had ERRORS ---> test deviceVertexFinderDBSCAN_tCudaAsync had ERRORS ---> test deviceVertexFinderOneKernel_tCudaAsync had ERRORS ROCm Unit TestsI found 1 errors in the following unit tests: ---> test alpakaTestBufferROCmAsync had ERRORS |
This PR proposes a general restructring of the Alpaka implementaion of the CA based pixel tracking. The idea here would be to make the CA:
TrackerTraits
and reading the geometry at runtime from theTrackerGeometry
in a configurable way. This would greatly simplify the inclusion of new layers (e.g. strips);The three updates here have many overlaps and sinergies but may be taken separately (if really needed).
Developments
CA Structures
In the CA few structures (defined in
RecoTracker/PixelSeeding/plugins/alpaka/CAStructures.h
) have a fixed size containers to hold the intermediate results. These are:OuterHitOfCellContainer
: an array keeping the index of a cell (uint32_t). One per hit. Keeps track of the cells that have that his as the outer one.CellNeighbors
: an array keeping the index of a cell (uint32_t). One per cell (==maxNumberOfCells). Keeps track of cells connected through the outer hit.CellTracks
: an array keeping the index of a tuple (uint32_t/uint16_t). One per cell. Keeps track of the tuples to which a cell belong. Mostly to remove duplicates.and they are sized on the maximum number of possible association for each cells/track. The current numbers were estimated using TTbar PU samples before Run3 start.
The idea here is to just move all these structures to be sized with the average per element using
OneToManyAssoc{Sequential|RandomAccess}
sizable at runtime (allocating the need buffers for the storage and the offsets) and so we can pass the averages at config level:device_hitToCell_
; withnOnes = nHits
andnMany = avgCellsPerHit * nHits
device_cellToNeighbors_
, withnOnes = maxCells
andnMany = avgCellsPerCell * maxCells
device_cellToTracks_
; withnOnes = maxCells
andnMany = avgTracksPerCell * maxCells
W.r.t the fixed size approach we need a couple of extra things:
CACoupleSoA
.Kernel_fillGenericCouple
.Container Sizes
Find in https://adriano.web.cern.ch/ca_geometry/containers/ the plots for all the container sizes:
nHits
,nOuterHits
(hits excluding barrel 1),nCells
,nTracks
,nTrips
(a cell attached to another cell);nCells_vs_nOuterHits
;nTrips_vs_nCells_avg
. For the variant withnTrips_vs_maxDoublets_avg
the denominator is the fixed size for cells we get from themaxDoublets
parameter.nCellTracks_vs_nCells_avg
. For the variant withnCellTracks_vs_maxDoublets
the denominator is the same as above.nCells
andnTracks
vs the number of hits, with a fit.An example here. "wp" stands for the working point selected for the given scenario.
N.B. the phase2 quads and trips have the same trends for cells since the graph is the same.
Euristical Sizes for Doublets and Tracks
The PR propose also to allow to define euristically the container sizes (
maxNumeberOfDoublets
andmaxNumberOfTuples
) via aTFormula
. At the moment I haven't run any proper test for the impact of this update on the memory usage for pp conditions. But the fact that the number of hits, cells and tracks show a wide span for consecutive events seems to point to the fact that making themaxNumeberOfDoublets
dependent on the number of hits may be beneficial for a production setup (in which we run on consecutive events in parallel).For run
Run2024F
,Run=383631
andLS=476
EphemeralHLTPhysics
data:The trends show also that we can leverage on a functional dependency between nHits and {nTracks|nCells} (~quadratical, as one would expect) for any of the scenarios:
(Run3 trips on MC and HLT pp on data overlaps nicely).
For example running on pp data (from
Run2024F
) one can easily fit the number of cells with the number of hits giving some safety margin.Samples used:
HIon : data
/store/hidata/HIRun2024B/HIEphemeralHLTPhysics/RAW/v1/000/388/305/00000/d8b13b7d-a94e-4b1f-9aae-bd86836a0459.root
;HIon Hyet: MC private
HydjetQMinBias_5020GeV+2023HIRP
;HLT pp: data
Run2024F
, Run383631
andls0476
EphemeralHLTPhysics
data;Run3 quads/trips: MC
/store/relval/CMSSW_14_1_0_pre2/RelValTTbar_14TeV/GEN-SIM-DIGI-RAW/PU_140X_mcRun3_2024_realistic_v7_STD_2024_PU-v1/2580000/
;Phase2 quads/trips: MC
/RelValTTbar_14TeV/CMSSW_15_0_0_pre1-PU_141X_mcRun4_realistic_v3_STD_Run4D110_PU-v1/GEN-SIM-DIGI-RAW
.hlt_hion
data from densely populatedHIRun2024B
events (run388305
).**N.B. for HLT pp on
EphemeralHLTPhysics
data for run383631
the current limit (512*1024
) was too low for ~0.32% of the events (over the 10000 used). This does imply any crash, we just stop pushing new doublets. **CAGeometry
A new
ESProduct
(CAGeometry
) is introduced that holds:phiCuts
,minZ
,maxZ
,maxR
for doublets;SimplePixelTopology
);TrackerGeometry
.dcaCut
andCAThetaCut
, also expanded to be one per layer useful for future tuning (especially if including the stript detector);Pixel DataFormats SoA
The Track and TrackingRecHit SoAs were templated with the
TrackerTraits
mostly to hold the helper histograms and for fixed size arrays (for the number of hits per track or modules). This could be simplified levaraging on thePortable{Host|Device}MultiCollection
.Miscs
I took the chance also to do some clean up here and there:
AverageGeometry
actually never used. It can be easily re-enabled if needed;SimplePixelTopology
numbering for Phase2 modules that was affecting the cluster doublet cuts;idealConditions
flag that was changing the cluster cut based on the pixel barrel side (only for Run3). It has never been used and no beneficial effect was found (studies were done in late 2021 and the efficiency was degradated).CPE
borught around in the chaing just for a single call to theFrameSoA
that has been moved in theCAGeometry
;32
instead of standard256
). This lead to crashes documented in Relval wf 14949.402 fails at runtime in CMSSW_14_2_X #46693. This would need anyway to be investigatged since, also inmaster
, there are HI events (with noPU) for which we reconstruct >100 vertices.Performance and Physics Studies
No changes to physics performance observed (as expected) for:
Small fluctuations visible for Phase2 and HI, I haven't found anything strange or a reason for them. They might be the "usual" irreproducibilities.
Posting here a couple of examples for the records
pp HLT
Performance measured on
devfu-c2b03-44-01
running/frozen/2024/2e34/v1.4/CMSSW_14_1_X/HLT
(adapted to be compatible with15_0_0_pre2
and the PR) on ~10k events fromRun2024I
, Run386593
and LS94
EphemeralHLTPhysics
data.Througput and timing
The throughput is basically untouched
this PR:
master:
As for the event timings find here all the piecharts. Posting here two of them (for
real_time
) for the records. The average timings measured are (for the 8 jobs x 3 times):488.48 ± 4.08 ms/ev
487.72 ± 4.40 ms/ev
Memory
All the memory plots are under https://adriano.web.cern.ch/ca_geometry/memory/.
The memory usage for 8 jobs x 32 threads x 24 streams is reduced by ~47%.
Phase2
Performance measured on a TTbar D110 PU Run4 RelVal sample (EDM input):
/RelValTTbar_14TeV/CMSSW_15_0_0_pre1-PU_141X_mcRun4_realistic_v3_STD_Run4D110_PU-v1/GEN-SIM-DIGI-RAW
Througput and timing
The optimal setup I found for
master
is 2 jobs with 8 threads and 8 streams. With 3 jobs we go out of memory and with 12 or 16 threads the througput is the same, just the memory increases. So I took this as the baseline for the comparisons. Here I'm running quadruplets only since for triplets the memory occupancy is very similar. This is a consequences of the fact that doublets related containers are vastly dominating and that, at the moment, they are the same for quads and trips given the same cell graph.this PR:
master:
Memory
In terms of memory the effect is even more important that for Run3 HLT with a reduction of ~70% in a configuration with 2 jobs 8 threads and 8 streams (
8t8s2j
).The same for
12t12s2j
that is not really beneficial for the througput and that is almost filling the two T4 available (formaster
). Also, having more memory available, configurations with more jobs may be tested brining almost a factor 2 to max throughput.(note maybe there's some further room for improvement using the euristical sizes)
HIon
(thanks to Soohwan for the informations and the samples to set this up)
At the moment the HIon menu runs only the pixel local reconstruction on GPU since the pixel track reco is too heavy on the GPU memory. The performance here are mesured:
/store/hidata/HIRun2024B/HIEphemeralHLTPhysics/RAW/v1/000/388/305/00000/d8b13b7d-a94e-4b1f-9aae-bd86836a0459.root
, converted to raw;/dev/CMSSW_14_2_0/HIon/V11
menu:RefCpu
);Ref
);Dev
).If I understood well and the configuration stayed the same we currently run with 8 jobs 8 threads and 8 streams (see https://its.cern.ch/jira/browse/CMSHLT-2951). For a full GPU menu (
Ref
) the best I could fit in two T4 is a setup with8t8s2j
. Running the same setup with this PR the memory usage is reduced by ~72% (with a +70% in througput).Given the lighter memory footprint we can push a bit the full GPU HI menu reaching the same througput w.r.t. to the
RefCpu
setup (16j16s8j
) with12t12s4j
. And can go up to +240% it with16t16s8j
(the maximum I could get).The HI runs seems also a good candidate to test the euristical sizes. For example or run
HIRun2024B
,Run=388305
andLS=123
EphemeralHLTPhysics
data, if we plot the number of cells, hits or tracks from consecutive events, we see:For cells we go from 1e4 to 1e6 (for non zero values). And we can fit the number of cells vs the number of hits:
Using this cut for cells we can reach up to more than three times the current througput with
16t16s16j
(while keeping a good margin on the max memory available). Going above (with e.g.16t16s20j
) is just increasing the memory occupation while keeping the same througput.