Closed
Description
In run 388037 (PbPb collisions, HLT release CMSSW_14_1_4_patch3
), we got the following segmentation violation:
Thread 10 (Thread 0x7fc9e9fff700 (LWP 3771560) "cmsRun"):
#0 0x00007fca8601c0e1 in poll () from /lib64/libc.so.6
#1 0x00007fca716b86e7 in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#2 0x00007fca716b88e4 in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#3 <signal handler called>
#4 0x00007fca03558802 in void storeTracks<edm::Event, std::vector<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > >, std::allocator<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > > > > >(edm::Event&, std::vector<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > >, std::allocator<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > > > > const&, TrackerTopology const&) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginRecoPixelVertexingPixelTrackFittingPlugins.so
#5 0x00007fca0355fbc6 in PixelTrackProducerFromSoAAlpaka<pixelTopology::HIonPhase1>::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginRecoPixelVertexingPixelTrackFittingPlugins.so
#6 0x00007fca88aafca2 in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#7 0x00007fca88aa913c in edm::WorkerT<edm::global::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#8 0x00007fca88a2bb19 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#9 0x00007fca88a2c021 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#10 0x00007fca887a22a8 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreConcurrency.so
#11 0x00007fca8718fb3b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fc99c341c00, waiter=..., this=0x7fca743b3a00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fca743b3a00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#13 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:137
#14 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/market.cpp:599
#15 0x00007fca87191cee in tbb::detail::r1::rml::private_worker::run (this=0x7fca743a8000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#16 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fca743a8000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#17 0x00007fca862c51ca in start_thread () from /lib64/libpthread.so.0
#18 0x00007fca85f30e73 in clone () from /lib64/libc.so.6
The log file from the HLT node can be found at https://cernbox.cern.ch/s/pnmiGV9LkISCWqU (25MB -- too large to be posted on gitHub).
The issue is reproducible with the following script (run on lxplus8-gpu
in CMSSW_14_1_4_patch3
):
#!/bin/bash -ex
# cmsrel CMSSW_14_1_4_patch3
# cd CMSSW_14_1_4_patch3/src
# cmsenv
hltGetConfiguration run:388037 \
--globaltag 141X_dataRun3_HLT_v1 \
--data \
--no-prescale \
--no-output \
--max-events -1 \
--input /store/group/tsg/FOG/error_stream_root/run388037/run388037_ls0133_index000200_fu-c2b05-14-01_pid3769082.root,/store/group/tsg/FOG/error_stream_root/run388037/run388037_ls0133_index000203_fu-c2b05-14-01_pid3769082.root,/store/group/tsg/FOG/error_stream_root/run388037/run388037_ls0133_index000214_fu-c2b05-14-01_pid3769082.root > hlt_388037.py
cat <<@EOF >> hlt_388037.py
process.options.wantSummary = True
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
@EOF
cmsRun hlt_388037.py &> hlt_388037.log
@cms-sw/hlt-l2 @cms-sw/heterogeneous-l2 FYI