-
Notifications
You must be signed in to change notification settings - Fork 4.4k
fixes for vector size handling and spurious wakeups in TimeStudyModules #47728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cms-bot internal usage |
-code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47728/44301 Code check has found code style and quality issues which could be resolved by applying following patch(s)
|
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47728/44302
|
A new Pull Request was created by @dan131riley for master. It involves the following packages:
@Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
@cmsbuild, please test |
+1 Size: This PR adds an extra 24KB to repository Comparison SummarySummary:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @dan131riley! I have a few questions, mostly for me to understand the situation better.
@@ -227,12 +227,13 @@ namespace timestudy { | |||
return true; | |||
} | |||
//every running stream is now waiting | |||
return waitingStreams_.size() == activeStreams_; | |||
return !waitingStreams_.empty() and waitingStreams_.size() == activeStreams_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this line is the main fix, i.e. before the readyToDoSomething()
could return true
if activeStreams_ == 0
. Is this correct?
} | ||
|
||
void threadWork() { | ||
while (not stopProcessing_.load()) { | ||
std::vector<int> streamsToProcess; | ||
streamsToProcess.reserve(waitingStreams_.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this line have any other purpose than memory allocation optimization, i.e. corresponding to
cmssw/FWCore/Modules/src/TimeStudyModules.cc
Line 191 in a363542
waitingStreams_.reserve(nStreams); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be safe to do
streamsToProcess.reserve(waitingStreams_.capacity()); | |
streamsToProcess.reserve(waitTimesPerStreams_.size()); |
As the size is the number of streams and that vector never changes its size once it is made.
@@ -261,6 +262,7 @@ namespace timestudy { | |||
} | |||
} | |||
waitingTaskPerStream_.clear(); | |||
waitingTaskPerStream_.resize(waitingTaskPerStream_.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At this stage the thread is about to end, so it is not clear to me why the resizing the waitingTaskPerStream_
would help. Nothing should access the waitingTaskPerStream_
at this point.
Or is the idea to make the asyncWork()
to not write to invalid memory? In that case, how about changing the []
to at()
there?
} | ||
|
||
void threadWork() { | ||
while (not stopProcessing_.load()) { | ||
std::vector<int> streamsToProcess; | ||
streamsToProcess.reserve(waitingStreams_.capacity()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a race condition. In order to interact with waitingStreams_
the mutex_
must be taken.
PR description:
TestFWCoreModules in FWCore/Modules has a rare failure mode, most recently seen in CMSSW_15_1_UBSAN_X_2025-03-26-2300, with UBSAN warnings
followed by a segmentation violation. This PR corrects two issues (found by code inspection) where the vector size or capacity wasn't being correctly maintained, and one issue (found through additional debugging) where spurious wakeups could be triggered when there were no waiting streams to process.
Resolves cms-sw/framework-team#1321
PR validation:
Technical fix. Without the change, the crash was moderately reproducible. With these modifications, no test failures have been observed.