Open
Description
A PromptReco job failure in the NanoAOD step was observed at the tier0 with the following error message
cms-talk thead: https://cms-talk.web.cern.ch/t/paused-job-for-promptreco-run381515-parkingvbf0-vertexexception/42163
----- Begin Fatal Exception 11-Jun-2024 10:46:46 CEST-----------------------
An exception of category 'VertexException' occurred while
[0] Processing Event run: 381515 lumi: 384 event: 765632765 stream: 0
[1] Running path 'write_NANOAOD_step'
[2] Prefetching for module PoolOutputModule/'write_NANOAOD'
[3] Prefetching for module SimplePATMuonFlatTableProducer/'muonTable'
[4] Calling method for module MuonBeamspotConstraintValueMapProducer/'muonBSConstrain'
Exception Message:
BasicSingleVertexState::could not invert weight matrix
----- End Fatal Exception -------------------------------------------------
The exception appears to be reproducible running on a single event, but only on AMD: the job fails at Tier0 (AMD EPYC 7763) and on my desktop (AMD Ryzen 9 5950X), but not on another Intel machine I tested (Intel Xeon Silver 4216).
Instructions to reproduce it on an EL8 AMD machine:
export SCRAM_ARCH=el8_amd64_gcc12
cmsrel CMSSW_14_0_7
cd CMSSW_14_0_7/src
cmsenv
cp /afs/cern.ch/user/c/cmst0/public/PausedJobs/Run2024E/vertexException/job/WMTaskSpace/cmsRun1/PSet.pkl .
cat > PSet_one.py <<END
import FWCore.ParameterSet.Config as cms
import pickle
with open('PSet.pkl', 'rb') as handle:
process = pickle.load(handle)
process.source.eventsToProcess = cms.untracked.VEventRange("381515:384:765632765",)
process.options.wantSummary = cms.untracked.bool(True)
process.options.numberOfThreads = 1
process.options.numberOfStreams = 1
END
cmsRun PSet_one.py 2>&1 | tee PSet_one.log