Skip to content

fixes for vector size handling and spurious wakeups in TimeStudyModules #47728

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 5, 2025

Conversation

dan131riley
Copy link

PR description:

TestFWCoreModules in FWCore/Modules has a rare failure mode, most recently seen in CMSSW_15_1_UBSAN_X_2025-03-26-2300, with UBSAN warnings

/data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_iterator.h:1096:17: runtime error: reference binding to null pointer of type 'int'
    #0 0x14b729a4d6ca in __gnu_cxx::__normal_iterator<int*, std::vector<int, std::allocator<int> > >::operator*() const /data/cmsbld/jenkins/workspace/build-any-ib/w/el8_amd64_gcc12/external/gcc/12.3.1-40d504be6370b5a30e3947a6e575ca28/include/c++/12.3.1/bits/stl_iterator.h:1096
    #1 0x14b729a4d6ca in timestudy::SleepingServer::threadWork() src/FWCore/Modules/src/TimeStudyModules.cc:246
    #2 0x14b7386d8a72 in execute_native_thread_routine ../../../../../libstdc++-v3/src/c++11/thread.cc:82
    #3 0x14b737c061c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
    #4 0x14b7382638d2 in __GI___clone (/lib64/libc.so.6+0x398d2)

src/FWCore/Modules/src/TimeStudyModules.cc:246:23: runtime error: load of null pointer of type 'int'
    #0 0x14b729a4d742 in timestudy::SleepingServer::threadWork() src/FWCore/Modules/src/TimeStudyModules.cc:246
    #1 0x14b7386d8a72 in execute_native_thread_routine ../../../../../libstdc++-v3/src/c++11/thread.cc:82
    #2 0x14b737c061c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
    #3 0x14b7382638d2 in __GI___clone (/lib64/libc.so.6+0x398d2)

followed by a segmentation violation. This PR corrects two issues (found by code inspection) where the vector size or capacity wasn't being correctly maintained, and one issue (found through additional debugging) where spurious wakeups could be triggered when there were no waiting streams to process.

Resolves cms-sw/framework-team#1321

PR validation:

Technical fix. Without the change, the crash was moderately reproducible. With these modifications, no test failures have been observed.

@cmsbuild
Copy link
Contributor

cmsbuild commented Mar 28, 2025

cms-bot internal usage

@cmsbuild
Copy link
Contributor

-code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47728/44301

Code check has found code style and quality issues which could be resolved by applying following patch(s)

@cmsbuild
Copy link
Contributor

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47728/44302

@cmsbuild
Copy link
Contributor

A new Pull Request was created by @dan131riley for master.

It involves the following packages:

  • FWCore/Modules (core)

@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks.
@makortel, @missirol, @wddgit this is something you requested to watch as well.
@antoniovilela, @mandrenguyen, @rappoccio, @sextonkennedy you are the release manager for this.

cms-bot commands are listed here

@makortel
Copy link
Contributor

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

+1

Size: This PR adds an extra 24KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-da1227/45273/summary.html
COMMIT: a363542
CMSSW: CMSSW_15_1_X_2025-03-28-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47728/45273/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially removed 1 lines from the logs
  • Reco comparison results: 4 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 3909207
  • DQMHistoTests: Total failures: 21
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 3909166
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

Copy link
Contributor

@makortel makortel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dan131riley! I have a few questions, mostly for me to understand the situation better.

@Dr15Jones
Copy link
Contributor

@dan131riley thought on the outstanding comments?

@makortel
Copy link
Contributor

makortel commented Jun 2, 2025

@dan131riley ping

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2025

+code-checks

Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47728/45041

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 3, 2025

Pull request #47728 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again.

@makortel
Copy link
Contributor

makortel commented Jun 4, 2025

@cmsbuild, please test

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2025

+1

Size: This PR adds an extra 16KB to repository
Summary: https://cmssdt.cern.ch/SDT/jenkins-artifacts/pull-request-integration/PR-da1227/46549/summary.html
COMMIT: 515fdc5
CMSSW: CMSSW_15_1_X_2025-06-04-1100/el8_amd64_gcc12
User test area: For local testing, you can use /cvmfs/cms-ci.cern.ch/week0/cms-sw/cmssw/47728/46549/install.sh to create a dev area with all the needed externals and cmssw changes.

Comparison Summary

Summary:

  • You potentially added 6 lines to the logs
  • Reco comparison results: 2 differences found in the comparisons
  • DQMHistoTests: Total files compared: 50
  • DQMHistoTests: Total histograms compared: 4048495
  • DQMHistoTests: Total failures: 3
  • DQMHistoTests: Total nulls: 0
  • DQMHistoTests: Total successes: 4048472
  • DQMHistoTests: Total skipped: 20
  • DQMHistoTests: Total Missing objects: 0
  • DQMHistoSizes: Histogram memory added: 0.0 KiB( 49 files compared)
  • Checked 215 log files, 184 edm output root files, 50 DQM output files
  • TriggerResults: no differences found

@makortel
Copy link
Contributor

makortel commented Jun 4, 2025

+core

@cmsbuild
Copy link
Contributor

cmsbuild commented Jun 4, 2025

This pull request is fully signed and it will be integrated in one of the next master IBs (tests are also fine). This pull request will now be reviewed by the release team before it's merged. @rappoccio, @sextonkennedy, @antoniovilela, @mandrenguyen (and backports should be raised in the release meeting by the corresponding L2)

@mandrenguyen
Copy link
Contributor

+1

@cmsbuild cmsbuild merged commit 68c3a5d into cms-sw:master Jun 5, 2025
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Investigate TestFWCoreModules failure
5 participants