-
Notifications
You must be signed in to change notification settings - Fork 354
[BUG] Re-training DetectorEnsemble with TSADEvaluator does not ensure 'train_config' to be DetectorEnsembleTrainConfig: 'dict' object has no attribute 'valid_frac' #175
Description
Describe the bug
Simulating live model deployment of the standard multivariate model DefaultDetector (i.e. DetectorEnsemble of VAE and RRCF) by means of the TSADEvaluator leads to periodic re-training. Initially, TSADEvaluator's default_retrain_kwargs() method ensures that train_config for training the DetectorEnsemble is an instance of DetectorEnsembleTrainConfig. However, after passing down re-training from DetectorEnsemble to the individual models, no care is taken to ensure that the train_config for the DefaultDetector will be an instance of the same class. Instead, train_config for training the DetectorEnsemble that is the DefaultDetector is of type dict which leads to the bug reported.
Most likely this is related to TSADEvaluator mismatch between full_train_kwargs and full_retrain_kwargs, as the lines
Merlion/merlion/evaluate/base.py
Lines 191 to 192 in 085ef8a
| full_train_kwargs = self.default_train_kwargs() | |
| full_train_kwargs.update(train_kwargs) |
do not ensure that
Merlion/merlion/evaluate/base.py
Line 202 in 085ef8a
| train_result = self._train_model(train_vals, **full_train_kwargs) |
train_config.
To Reproduce
Bug has been identified by going over the tutorial on "Multivariate Time Series Anomaly Detection" for Merlion v2.0.2, section "Model Inference and Quantitative Evaluation" (see https://opensource.salesforce.com/Merlion/v2.0.2/tutorials/anomaly/2_AnomalyMultivariate.html#Model-Inference-and-Quantitative-Evaluation). When performing "Sliding Window Evaluation" with TSADEvaluator, the ensemble fails at re-training the DefaultDetector model due to the bug reported.
Expected behavior
Successful re-training of the DefaultDetector model as part of DetectorEnsemble models when using TSADEvaluator.
Screenshots
A screenshot of the resulting error stack trace is attached.

Desktop
- OS: Ubuntu 24.04 LTS
- Merlion Version: 2.0.2
- Python Version: 3.9.18
- openjdk-11-jdk installed as per docs.
Additional context
At the re-train trigger
Merlion/merlion/evaluate/base.py
Line 230 in 085ef8a
| if t >= t_next and not cur_train.is_empty() and not cur_test.is_empty(): |
merlion.evaluate.anomaly.TSADEvaluator'sget_predict()invokesmerlion.evaluate.base.EvaluatorBase'sget_predict(). The latter contains the re-training logic.- When re-training is initiated,
self.modelis an instance ofmerlion.models.ensemble.anomaly.DetectorEnsemble. Consequently,EvaluatorBase's_train_model()invokesDetectorEnsemble'strain(). merlion.models.ensemble.anomaly.DetectorEnsembleinherits frommerlion.models.ensemble.base.EnsembleBaseandmerlion.models.anomaly.base.DetectorBase. Only the latter has atrain()method. Therefore,DetectorEnsemble'strain()actually callsDetectorBase'strain().- Using
call_with_accepted_kwargs,DetectorBase'strain()invokesDetectorEnsemble's_train(). - After executing
Merlion/merlion/models/ensemble/anomaly.py
Line 139 in 085ef8a
train_cfgs = train_config.per_model_train_configs train_cfgsbecomesList[dict]. TSADEvaluator'sget_predict()is invoked at the first iteration ofwhich is responsible for re-training the first ensemble model which is an instance ofMerlion/merlion/models/ensemble/anomaly.py
Lines 159 to 164 in 085ef8a
for i, (model, cfg, pr_cfg) in enumerate(zip(self.models, train_cfgs, pr_cfgs)): try: train_kwargs = dict(train_config=cfg, anomaly_labels=anomaly_labels, post_rule_train_config=pr_cfg) train_scores, valid_scores = TSADEvaluator(model=model, config=eval_cfg).get_predict( train_vals=train, test_vals=valid, train_kwargs=train_kwargs, post_process=True ) merlion.models.defaults.DefaultDetector. At this moment,train_kwargs['train_config']is of typedict. Effectively,TSADEvaluator'sget_predict()invokesEvaluatorBase'sget_predict().EvaluatorBasesget_predict()invokesEvaluatorBase's_train_model(). The latter invokesmerlion.models.defaults.DefaultDetector'strain(). At this moment,train_configis of typedict.self.modelis set to be aDetectorEnsembleof VAE and RRCF, andDefaultDetector'strain()invokesLayeredDetector'strain().merlion.models.layers.LayeredDetectorinherits frommerlion.models.layers.LayeredModelandmerlion.models.anomaly.base.DetectorBase. Only the latter has atrain()method. Therefore,LayeredDetector'strain()invokesDetectorBase'strain(). At this moment,train_configis of typedict.- Using
call_with_accepted_kwargs,DetectorBase'strain()invokesDetectorEnsemble's_train(). train_configis required to be an instance ofDetectorEnsembleTrainConfig. We see that this is not the case. The error occurs.