-
Notifications
You must be signed in to change notification settings - Fork 877
Description
Hi, I have encountered the following problem with the way NAB scores. Below I present a comparison of how it score two detectors on the same dataset, the ARTime and the Numenta.
The first problem we see here is that the anomaly windows which are defined like this have a shortcoming that it is impossible for any algorithm to detect an anomaly in the first half of the window because there are no indicators. The ARTime correctly flags the anomaly as soon as it happens. And so if the window starts earlier than that then currently the NAB metric only serves to give the detector a lower score, even though it has correctly flagged the anomaly as soon as any human could have.
For result file : ../results/ARTime/artificialWithAnomaly/ARTime_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2679
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17164198169496
Second problem we see is that with how it has scored the ARTime NAB outside of the anomaly windows. It has given it a negative score, even though there is no problem. It has not flagged anomalies and yet a negative score, as if FP and FN exists.
For comparison, see below how the scoring happens for the Numenta detector. Which is worse than ARTtime and yet achieves a higher NAB standard score. For this algorithm, the scoring is done correctly compared to ARTime.
For result file : ../results/numenta/artificialWithAnomaly/numenta_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 1
True Negative (Detected non anomalies) : 4027
False Positive (False alarms) : 0
False Negative (Anomaly not detected) : 0
Total data points : 4032
S(t)_standard score : 0.4999963147227
Finally, compare the S(t)_standard score for both algorithms.



