Skip to content

HLT crash in run 388037: segmentation violation in PixelTrackProducerFromSoAAlpaka<pixelTopology::HIonPhase1>::produce #46656

Closed
@mmusich

Description

@mmusich

In run 388037 (PbPb collisions, HLT release CMSSW_14_1_4_patch3), we got the following segmentation violation:

Thread 10 (Thread 0x7fc9e9fff700 (LWP 3771560) "cmsRun"):
#0  0x00007fca8601c0e1 in poll () from /lib64/libc.so.6
#1  0x00007fca716b86e7 in edm::service::InitRootHandlers::stacktraceFromThread() () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#2  0x00007fca716b88e4 in sig_dostack_then_abort () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginFWCoreServicesPlugins.so
#3  <signal handler called>
#4  0x00007fca03558802 in void storeTracks<edm::Event, std::vector<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > >, std::allocator<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > > > > >(edm::Event&, std::vector<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > >, std::allocator<std::pair<reco::Track*, std::vector<TrackingRecHit const*, std::allocator<TrackingRecHit const*> > > > > const&, TrackerTopology const&) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginRecoPixelVertexingPixelTrackFittingPlugins.so
#5  0x00007fca0355fbc6 in PixelTrackProducerFromSoAAlpaka<pixelTopology::HIonPhase1>::produce(edm::StreamID, edm::Event&, edm::EventSetup const&) const () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/pluginRecoPixelVertexingPixelTrackFittingPlugins.so
#6  0x00007fca88aafca2 in edm::global::EDProducerBase::doEvent(edm::EventTransitionInfo const&, edm::ActivityRegistry*, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#7  0x00007fca88aa913c in edm::WorkerT<edm::global::EDProducerBase>::implDo(edm::EventTransitionInfo const&, edm::ModuleCallingContext const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#8  0x00007fca88a2bb19 in std::__exception_ptr::exception_ptr edm::Worker::runModuleAfterAsyncPrefetch<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >(std::__exception_ptr::exception_ptr, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::TransitionInfoType const&, edm::StreamID, edm::ParentContext const&, edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1>::Context const*) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#9  0x00007fca88a2c021 in edm::Worker::RunModuleTask<edm::OccurrenceTraits<edm::EventPrincipal, (edm::BranchActionType)1> >::execute() () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreFramework.so
#10 0x00007fca887a22a8 in tbb::detail::d1::function_task<edm::WaitingTaskList::announce()::{lambda()#1}>::execute(tbb::detail::d1::execution_data&) () from /opt/offline/el8_amd64_gcc12/cms/cmssw/CMSSW_14_1_4/lib/el8_amd64_gcc12/scram_x86-64-v3/libFWCoreConcurrency.so
#11 0x00007fca8718fb3b in tbb::detail::r1::task_dispatcher::local_wait_for_all<false, tbb::detail::r1::outermost_worker_waiter> (t=0x7fc99c341c00, waiter=..., this=0x7fca743b3a00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:322
#12 tbb::detail::r1::task_dispatcher::local_wait_for_all<tbb::detail::r1::outermost_worker_waiter> (t=0x0, waiter=..., this=0x7fca743b3a00) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/task_dispatcher.h:458
#13 tbb::detail::r1::arena::process (tls=..., this=<optimized out>) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/arena.cpp:137
#14 tbb::detail::r1::market::process (this=<optimized out>, j=...) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/market.cpp:599
#15 0x00007fca87191cee in tbb::detail::r1::rml::private_worker::run (this=0x7fca743a8000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:271
#16 tbb::detail::r1::rml::private_worker::thread_routine (arg=0x7fca743a8000) at /data/cmsbld/jenkins/workspace/ib-run-pr-tests/testBuildDir/BUILD/el8_amd64_gcc12/external/tbb/v2021.9.0-2391c941213c757dc9a1835b31681235/tbb-v2021.9.0/src/tbb/private_server.cpp:221
#17 0x00007fca862c51ca in start_thread () from /lib64/libpthread.so.0
#18 0x00007fca85f30e73 in clone () from /lib64/libc.so.6

The log file from the HLT node can be found at https://cernbox.cern.ch/s/pnmiGV9LkISCWqU (25MB -- too large to be posted on gitHub).

The issue is reproducible with the following script (run on lxplus8-gpu in CMSSW_14_1_4_patch3):

#!/bin/bash -ex

# cmsrel CMSSW_14_1_4_patch3
# cd CMSSW_14_1_4_patch3/src
# cmsenv

hltGetConfiguration run:388037 \
		    --globaltag 141X_dataRun3_HLT_v1 \
		    --data \
		    --no-prescale \
		    --no-output \
		    --max-events -1 \
		    --input /store/group/tsg/FOG/error_stream_root/run388037/run388037_ls0133_index000200_fu-c2b05-14-01_pid3769082.root,/store/group/tsg/FOG/error_stream_root/run388037/run388037_ls0133_index000203_fu-c2b05-14-01_pid3769082.root,/store/group/tsg/FOG/error_stream_root/run388037/run388037_ls0133_index000214_fu-c2b05-14-01_pid3769082.root > hlt_388037.py

cat <<@EOF >> hlt_388037.py
process.options.wantSummary = True
process.options.numberOfThreads = 1
process.options.numberOfStreams = 0
@EOF

cmsRun hlt_388037.py &> hlt_388037.log

@cms-sw/hlt-l2 @cms-sw/heterogeneous-l2 FYI

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions