-
Notifications
You must be signed in to change notification settings - Fork 46
Open
Labels
BUGFor BUGSFor BUGSLowLow priorityLow priorityfactory-monfor affected componentfor affected componentfactoryopsFactory Operations stakeholderFactory Operations stakeholder
Description
Describe the bug
The UCSD factory machine locked up for an unknown reason (possibly a cooling issue in the room). Once the machine recovered the monitor was not available. Turned out some monitor cache file were empty and the factory was not expecting that.
To Reproduce
Run the factory for a while and then make one of the ftspk file empty, for example:
/var/log/gwms-factory/server/entry_CMSHTPC_T2_US_Purdue_Negishi_Op/condor_activity_20230729_UCSD-CMS-Frontend.main.log.fecmsucsd.ftstpk
Expected behavior
The corner case should be handled correctly and monitor available.
Info (please complete the following information):
- Priority: low
- Stakeholders: FactoryOps
- Components: factory monitoring
Additional context
...
[2023-07-29 23:37:07,142] DEBUG: glideFactoryEntry:1058: Checking security credentials for client UCSD-CMS-Frontend.main
[2023-07-29 23:37:07,218] ERROR: glideFactoryEntry:1819: Could not read /var/log/gwms-factory/server/entry_CMSHTPC_T2_US_Purdue_Negishi_Op/condor_activity_20230729_UCSD-CMS-Frontend.main.log.fecmsucsd.ftstpk
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/glideinwms/lib/condorLogParser.py", line 1386, in loadCache
data = util.file_pickle_load(fname)
File "/usr/lib/python3.6/site-packages/glideinwms/lib/util.py", line 306, in file_pickle_load
conditional_raise(mask_exceptions)
File "/usr/lib/python3.6/site-packages/glideinwms/lib/util.py", line 295, in file_pickle_load
data = pickle.load(fo)
EOFError: Ran out of input
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/lib/python3.6/site-packages/glideinwms/factory/glideFactoryEntry.py", line 1817, in perform_work_v3
log_stats[credential_username + ":" + client_int_name].load()
File "/usr/lib/python3.6/site-packages/glideinwms/lib/condorLogParser.py", line 671, in load
obj.load()
File "/usr/lib/python3.6/site-packages/glideinwms/lib/condorLogParser.py", line 82, in load
return self.loadCache()
File "/usr/lib/python3.6/site-packages/glideinwms/lib/condorLogParser.py", line 104, in loadCache
self.data = loadCache(self.cachename)
File "/usr/lib/python3.6/site-packages/glideinwms/lib/condorLogParser.py", line 1388, in loadCache
raise RuntimeError("Could not read %s" % fname)
RuntimeError: Could not read /var/log/gwms-factory/server/entry_CMSHTPC_T2_US_Purdue_Negishi_Op/condor_activity_20230729_UCSD-CMS-Frontend.main.log.fecmsucsd.ftstpk
[2023-07-29 23:38:34,834] DEBUG: glideFactoryEntry:1058: Checking security credentials for client UCSD-CMS-Frontend.main
...
Metadata
Metadata
Assignees
Labels
BUGFor BUGSFor BUGSLowLow priorityLow priorityfactory-monfor affected componentfor affected componentfactoryopsFactory Operations stakeholderFactory Operations stakeholder