Open
Description
Title says it all, see #46686 (comment)
I tested in a plain CMSSW_14_2_X_2024-11-12-2300
(without anything checked out on top) with:
runTheMatrix.py --what gpu,upgrade -l 14949.402 -t 4
I see
$ more 14949.402_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka/step3_HydjetQMinBias_5020GeV+2022HI_Patatrack_PixelOnlyAlpaka.log
RAW2DIGI:RawToDigi_pixelOnly,RECO:reconstruction_pixelTrackingOnly,VALIDATION:@pixelTrackingOnlyValidation,DQM:@pixelTrackingOnlyDQM
We have determined that this is simulation (if not, rerun cmsDriver.py with --data)
with DB:
entry file:step2.root
Step: RAW2DIGI Spec: ['RawToDigi_pixelOnly']
Step: RECO Spec: ['reconstruction_pixelTrackingOnly']
Step: VALIDATION Spec: ['@pixelTrackingOnlyValidation']
@pixelTrackingOnlyValidation in preparing validation
Step: DQM Spec: ['@pixelTrackingOnlyDQM']
customising the process with customiseAlpakaServiceMemoryFilling from HeterogeneousCore/AlpakaServices/customiseAlpakaServiceMemoryFilling
customising the process with setCrossingFrameOn from SimGeneral/MixingModule/fullMixCustomize_cff
Starting cmsRun step3_RAW2DIGI_RECO_VALIDATION_DQM.py
%MSG-i ThreadStreamSetup: (NoModuleName) 13-Nov-2024 14:45:57 CET pre-events
setting # threads 4
setting # streams 4
%MSG
%MSG-i AlpakaService: (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceSerialSync succesfully initialised.
Found 1 device:
- Intel(R) Xeon(R) Silver 4216 CPU @ 2.10GHz
%MSG
%MSG-i CUDAService: (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
CUDA runtime version 12.4, driver version 12.4, NVIDIA driver version 550.127.05
CUDA device 0: Tesla T4 (sm_75)
%MSG
%MSG-i AlpakaService: (NoModuleName) 13-Nov-2024 14:45:59 CET pre-events
AlpakaServiceCudaAsync succesfully initialised.
Found 1 device:
- Tesla T4
%MSG
13-Nov-2024 14:46:06 CET Initiating request to open file file:step2.root
13-Nov-2024 14:46:11 CET Successfully opened file file:step2.root
%MSG-w NonConsumedConditionalModules: AfterModConstruction 13-Nov-2024 14:46:14 CET pre-events
The following modules were part of some ConditionalTask, but were not
consumed by any other module in any of the Paths to which the ConditionalTask
was associated. Perhaps they should be either removed from the
job, or moved to a Task to make it explicit they are unscheduled.
siPixelDigis@cpu
siPixelRecHitsPreSplitting@cpu
%MSG
Begin processing the 1st record. Run 1, Event 1, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:16.182 CET
Begin processing the 2nd record. Run 1, Event 6, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:16.211 CET
Begin processing the 3rd record. Run 1, Event 7, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 4th record. Run 1, Event 5, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:16.229 CET
Begin processing the 5th record. Run 1, Event 9, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:18.913 CET
Begin processing the 6th record. Run 1, Event 10, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:18.955 CET
Begin processing the 7th record. Run 1, Event 2, LumiSection 1 on stream 3 at 13-Nov-2024 14:46:18.995 CET
Begin processing the 8th record. Run 1, Event 8, LumiSection 1 on stream 2 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 9th record. Run 1, Event 4, LumiSection 1 on stream 0 at 13-Nov-2024 14:46:19.258 CET
Begin processing the 10th record. Run 1, Event 3, LumiSection 1 on stream 1 at 13-Nov-2024 14:46:20.056 CET
src/RecoVertex/PixelVertexFinding/plugins/alpaka/clusterTracksByDensity.h:205: void alpaka_cuda_async::vertexFinder::clusterTracksByDensity(const alpaka::AccGpuUniformCudaHipRt<alpaka::ApiCudaRt, std::in
tegral_constant<unsigned long, 1UL>, unsigned int> &, reco::ZVertexLayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, reco::ZVertexTracksLayout<128UL, false>::ViewTemplateFreeParam
s<128UL, false, true, true> &, vertexFinder::PixelVertexWSSoALayout<128UL, false>::ViewTemplateFreeParams<128UL, false, true, true> &, int, float, float, float): block: [0,0,0], thread: [640,0,0] Asserti
on `static_cast<int>(foundClusters) < data.metadata().size()` failed.
suspected to be related to #45887.
At the moment this workflow is not exercised in IBs - thus the issue went unnoticed.