-
Notifications
You must be signed in to change notification settings - Fork 4.4k
Implement the CheckClassInfo module #47567
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
cms-bot internal usage |
f488b3a
to
6206a1b
Compare
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47567/44054
|
A new Pull Request was created by @fwyzard for master. It involves the following packages:
@Dr15Jones, @makortel, @smuzaffar can you please review it and eventually sign? Thanks. cms-bot commands are listed here |
-1 Failed Tests: UnitTests Unit TestsI found 1 errors in the following unit tests: ---> test TestFWCoreModulesCheckClassInfo had ERRORS Comparison SummarySummary:
|
The new unit test fails as expected: testing.log . |
process = cms.Process("TEST") | ||
|
||
# load the latest HLT configuration in order to exercise a large number of event products | ||
process.load('HLTrigger.Configuration.HLT_GRun_cff') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is targeting specifically the HLT menu event products, would it make sense to move the test in HLTrigger/Configuration
, or do you reckon this would be executed anyway if we change the list of those in the menu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not targeted specifically at HLT, though I run into the problem with the HLT configuration.
It is just a simple way to construct a large number of modules in a complex enough configuration.
Ideally we could run this test with all CMSSW configurations (oneline, offline, legacy, Run 3, Phase 2, etc.) any time there are changes to data formats, dictionaries, or ROOT itself.
But I'm not sure what would be a way to set that up, short of adding this check to all cmsDriver jobs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my framework developer perspective, I really would not want any dependencies outside of the framework packages itself to be in FWCore/TestModules, that includes python dependencies. When I do work, I only checkout exactly the group of packages need to build and test the framework. Having this test here would vastly extend what would have to be checked out.
So I'm all for the module existing in FWCore/TestModules, but it would be better to test with just the modules defined within the FWCore subsystem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I'm all for the module existing in FWCore/TestModules, but it would be better to test with just the modules defined within the FWCore subsystem.
The problem is that restricting the test to the small number of framework-only data formats and producers will very likely not reproduce the current issue, making the test much less useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cms-sw/operations-l2 @cms-sw/pdmv-l2 (assuming cmsDriver
is under either of those group of L2s), would you find it acceptable to add this module to at least some of the cmsDriver workflows ?
I'm looking for a way to test that ROOT is able to properly load the dictionaries for all the persistent data formats used in CMSSW, even in the presence of multiple threads and streams.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In my mind there are two sets of tests. One is a unit test which uses a controlled situation to prove that the module does what it is supposed to do, i.e. raise an 'alarm' when it encounters the problem. Such a test I would see living in FWCore/TestModules/test. The second test is an integration test that makes sure CMSSW as a whole (or at least important parts) does not have the problem. That test should not be in FWCore/TestModules/test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The second test is an integration test that makes sure CMSSW as a whole (or at least important parts) does not have the problem.
Actually, why does the framework not make this check for all products in all jobs ?
Isn't it a requirement to be able to store the non-transient collections ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, why does the framework not make this check for all products in all jobs ?
Isn't it a requirement to be able to store the non-transient collections ?
The dictionary existence checks have largely relied to TClass::GetDict()
and TClass::GetMissingDictionaries()
(and maybe something else I don't remember right now). We had no idea TClass::ClassProperty()
or TClass::GetClassInfo()
would make a difference, that they apparently with -Wl,--as-needed
they do (#47470).
Perhaps we should consider adding the TClass::ClassProperty()
and TClass::GetClassInfo()
calls to the dictionary existence checks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dictionary existence checks have largely relied to
TClass::GetDict()
andTClass::GetMissingDictionaries()
(and maybe something else I don't remember right now). We had no ideaTClass::ClassProperty()
orTClass::GetClassInfo()
would make a difference, that they apparently with-Wl,--as-needed
they do (#47470).
Mhm... I didn't know about #47470, thanks for the pointer.
I've backported this test to CMSSW 14.2.0, which should not have the -Wl,--as-needed
flags, and I still see the same failures:
- a crash when checking the products from the full HLT menu
- an exception when checking a smaller HCAL+PF workflow
So this error seems unrelated to -Wl,--as-needed
.
Perhaps we should consider adding the
TClass::ClassProperty()
andTClass::GetClassInfo()
calls to the dictionary existence checks.
By the way, TClass::ClassProperty()
seems fine, in the sense that I haven't sencountered ay problems with it. TClass::GetClassInfo()
is the one that fails.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've backported this test to CMSSW 14.2.0, which should not have the
-Wl,--as-needed
flags, and I still see the same failures:
Thanks for the test. So the TClass::GetClassInfo()
does not seem to be relevant for discovering the memory use increase discussed in #47470.
Perhaps we should consider adding the
TClass::ClassProperty()
andTClass::GetClassInfo()
calls to the dictionary existence checks.By the way,
TClass::ClassProperty()
seems fine, in the sense that I haven't sencountered ay problems with it.TClass::GetClassInfo()
is the one that fails.
I took a look of what TClass::GetClassInfo()
does. It calls LoadClassInfo()
if it hasn't been called yet (https://github.com/root-project/root/blob/28a45b686066b309e753c4a919d1c21f10aed528/core/meta/inc/TClass.h#L433-L437), and LoadClassInfo()
auto-parses the class header unless the auto-parsing is disabled (https://github.com/root-project/root/blob/28a45b686066b309e753c4a919d1c21f10aed528/core/meta/src/TClass.cxx#L5824-L5855). So I don't think we want to call that function from the framework (given that auto-parsing typically leads to higher memory use). Its information does not seem to be necessary for the I/O.
@fwyzard Could you describe the context that lead you to call the TClass::GetClassInfo()
and experience the failures?
6206a1b
to
22ba0a9
Compare
please test |
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47567/44067
|
Pull request #47567 was updated. @Dr15Jones, @makortel, @smuzaffar can you please check and sign again. |
Using
Using the MessageLogger that message is lost (from the latest test):
Throwing an exception instead of using an assertion does not help, because the job fails before reaching the check. |
888ba9f
to
909afdc
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47567/44069
|
Pull request #47567 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
Add an automated test that checks the ClassInfo of all persistent HLT products.
909afdc
to
5d7b664
Compare
+code-checks Logs: https://cmssdt.cern.ch/SDT/code-checks/cms-sw-PR-47567/44070
|
Pull request #47567 was updated. @Dr15Jones, @cmsbuild, @makortel, @smuzaffar can you please check and sign again. |
please test |
@pcanal running the same test of a different configuration step3.py I get a different error:
I don't know if this helps isolating the issue. |
Ah, the point is that the JIT (via
It could. At least the object destructor (that inserts the message into the queue) would be run before the line that could trigger the crash. |
Yes, it works:
|
-1 Failed Tests: UnitTests Unit TestsI found 2 errors in the following unit tests: ---> test test-das-selected-lumis had ERRORS ---> test TestFWCoreModulesCheckClassInfo had ERRORS Comparison SummarySummary:
|
Calling |
This usually indicates that the autoparsing for that classes failed. The 3 mains reasons:
|
No, the goal is not to force the header parsing, it's to use the dictionaries without parsing the headers. Then I need to reproduce the original errors without calling |
PR description:
Implement the CheckClassInfo module. This module will query the TClass, ClassProperty and ClassInfo of all the persistent products specified in its configuration.
Add an automated test that checks the ClassInfo of all persistent HLT products.
PR validation:
The new tests is expected to fail, until the underlying issue is understood and fixed.
If this PR is a backport please specify the original PR and why you need to backport that PR. If this PR will be backported please specify to which release cycle the backport is meant for:
May be backported to 15.0.x to ensure the proper behaviour in that release cycle.