Problem with NAB metric

Hi, I have encountered the following problem with the way NAB scores. Below I present a comparison of how it score two detectors on the same dataset, the ARTime and the Numenta.

![Image](https://github.com/user-attachments/assets/f6787f7e-bf5b-47a1-934e-c5bd8ea3e0fd)

The first problem we see here is that the anomaly windows which are defined like this have a shortcoming that it is impossible for any algorithm to detect an anomaly in the first half of the window because there are no indicators. The ARTime correctly flags the anomaly as soon as it happens. And so if the window starts earlier than that then currently the NAB metric only serves to give the detector a lower score, even though it has correctly flagged the anomaly as soon as any human could have.

![Image](https://github.com/user-attachments/assets/0320206c-f48a-4412-82ec-8512a6325971)

For result file : ../results/ARTime/artificialWithAnomaly/ARTime_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 403
True Negative (Detected non anomalies) : 0
False Positive (False alarms) : 2679
False Negative (Anomaly not detected) : 0
Total data points : 3428
S(t)_standard score : -90.17164198169496

Second problem we see is that with how it has scored the ARTime NAB outside of the anomaly windows. It has given it a negative score, even though there is no problem. It has not flagged anomalies and yet a negative score, as if FP and FN exists.

For comparison, see below how the scoring happens for the Numenta detector. Which is worse than ARTtime and yet achieves a higher NAB standard score. For this algorithm, the scoring is done correctly compared to ARTime.

![Image](https://github.com/user-attachments/assets/3f351702-1548-4461-ac90-41dea00a34f4)

![Image](https://github.com/user-attachments/assets/04534a31-ddea-4742-a7a8-2d9241518350)

For result file : ../results/numenta/artificialWithAnomaly/numenta_art_daily_flatmiddle.csv
True Positive (Detected anomalies) : 1
True Negative (Detected non anomalies) : 4027
False Positive (False alarms) : 0
False Negative (Anomaly not detected) : 0
Total data points : 4032
S(t)_standard score : 0.4999963147227


Finally, compare the S(t)_standard score for both algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with NAB metric #410

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Problem with NAB metric #410

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions