Anomaly detection by Perimora · Pull Request #72 · amosproj/amos2025ws03-rtdip-timeseries-forecasting

Perimora · 2025-12-03T11:09:29Z

No description provided.

Signed-off-by: Rene Jokiel <Rene.Jokiel@fau.de>

Signed-off-by: Christoph Huy <christoph.huy@campus.tu-berlin.de>

Signed-off-by: Mehdi-kbz <141425685+Mehdi-kbz@users.noreply.github.com>

…etection

Signed-off-by: Christoph Huy <christoph.huy@campus.tu-berlin.de>

Added tests for IqrAnomalyDetection and IqrAnomalyDetectionRollingWindow using Pytest. Signed-off-by: Mehdi-kbz <141425685+Mehdi-kbz@users.noreply.github.com>

Copilot

Pull request overview

This PR introduces anomaly detection capabilities to the RTDIP SDK, implementing statistical methods for identifying outliers in time-series data using PySpark. The implementation includes both Median Absolute Deviation (MAD) and Interquartile Range (IQR) based detection methods.

Adds MAD and IQR anomaly detection algorithms with rolling window variants
Includes STL+MAD composite method for handling seasonal patterns
Provides comprehensive test coverage with synthetic and real-world data scenarios

Reviewed changes

Copilot reviewed 11 out of 11 changed files in this pull request and generated 23 comments.

Show a summary per file

File	Description
`src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/interfaces.py`	Defines base interface for anomaly detection components
`src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/mad_anomaly_detection.py`	Implements MAD-based anomaly detection with rolling window and STL variants
`src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/iqr_anomaly_detection.py`	Implements IQR-based anomaly detection with rolling window support
`tests/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/test_mad.py`	Test suite for MAD anomaly detection methods
`tests/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/test_iqr_anomaly_detection.py`	Test suite for IQR anomaly detection methods
`amos_team_resources/anomaly_detection/mad/visualize_test.py`	Visualization script for MAD detection results
`amos_team_resources/anomaly_detection/iqr/visualizeIQR.py`	Visualization script for IQR detection results
`src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/__init__.py`	Package initialization with copyright header
`src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/__init__.py`	Spark submodule initialization
`tests/sdk/python/rtdip_sdk/pipelines/anomaly_detection/__init__.py`	Test package initialization
`tests/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/__init__.py`	Test spark submodule initialization

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-03T11:18:44Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/mad_anomaly_detection.py

+        pdf["is_anomaly"] = pdf["rolling_mad_z"].abs() > self.threshold
+
+        # keep only anomalies
+        anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()


Avoid explicit comparison with True. Use pdf["is_anomaly"] instead of pdf["is_anomaly"] == True for more Pythonic code.

Suggested change

anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()

anomalies_pdf = pdf[pdf["is_anomaly"]].copy()

Copilot · 2025-12-03T11:18:45Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/mad_anomaly_detection.py

+        pdf["is_anomaly"] = pdf["mad_zscore"].abs() > self.threshold
+
+        # keep ONLY anomalies
+        anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()


Avoid explicit comparison with True. Use pdf["is_anomaly"] instead of pdf["is_anomaly"] == True for more Pythonic code.

Suggested change

anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()

anomalies_pdf = pdf[pdf["is_anomaly"]].copy()

Copilot · 2025-12-03T11:18:45Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/iqr_anomaly_detection.py

+        pdf["is_anomaly"] = (pdf["value"] < lower_bound) | (pdf["value"] > upper_bound)
+
+        # Keep only anomalies
+        anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()


Avoid explicit comparison with True. Use pdf["is_anomaly"] instead of pdf["is_anomaly"] == True for more Pythonic code.

Suggested change

anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()

anomalies_pdf = pdf[pdf["is_anomaly"]].copy()

Copilot · 2025-12-03T11:18:46Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/mad_anomaly_detection.py

+        if mad == 0:
+            pdf["mad_zscore"] = 0
+        else:
+            pdf["mad_zscore"] = 0.6745 * (pdf["value"] - median) / mad


The check if mad == 0 on line 88 is unreachable because line 86 sets mad = float(max(mad, 1.0)), ensuring mad >= 1.0. Consider removing the dead code or adjusting the clamping logic if zero MAD needs to be handled differently.

Suggested change

if mad == 0:

pdf["mad_zscore"] = 0

else:

pdf["mad_zscore"] = 0.6745 * (pdf["value"] - median) / mad

pdf["mad_zscore"] = 0.6745 * (pdf["value"] - median) / mad

Copilot · 2025-12-03T11:18:46Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/mad_anomaly_detection.py

+
+        Returns only the detected anomalies.
+
+        :param df: Spark DataFrame containing a numeric "value" column.


The docstring states that the DataFrame should contain a column named "value", but doesn't document that a "timestamp" column is required for the rolling window variant (as seen in line 151 where it's sorted). Consider adding this requirement to the documentation or making it an explicit parameter.

Suggested change

:param df: Spark DataFrame containing a numeric "value" column.

:param df: Spark DataFrame containing a numeric "value" column and a "timestamp" column.

The "timestamp" column is required for sorting and rolling window calculations.

Copilot · 2025-12-03T11:18:50Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/iqr_anomaly_detection.py

+        pdf["is_anomaly"] = (pdf["value"] < lower_bound) | (pdf["value"] > upper_bound)
+
+        # Keep only anomalies
+        anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()


Avoid explicit comparison with True. Use pdf["is_anomaly"] instead of pdf["is_anomaly"] == True for more Pythonic code.

Suggested change

anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()

anomalies_pdf = pdf[pdf["is_anomaly"]].copy()

Copilot · 2025-12-03T11:18:51Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/iqr_anomaly_detection.py

+        """
+
+        # Spark → Pandas
+        pdf = df.toPandas()


Converting the entire Spark DataFrame to Pandas with toPandas() defeats the purpose of using Spark and can cause memory issues with large datasets. Consider implementing the IQR calculation using native Spark operations (e.g., using approxQuantile for percentiles) to maintain scalability.

Copilot · 2025-12-03T11:18:51Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/mad_anomaly_detection.py

+        rolling_mad = rolling_mad.apply(lambda x: max(x, 1.0))
+
+        # Robust rolling z-score
+        pdf["rolling_mad_z"] = 0.6745 * (pdf["value"] - rolling_median) / rolling_mad


Magic number 0.6745 should be defined as a named constant with a comment explaining its significance. This constant is the scaling factor to convert MAD to an equivalent standard deviation (specifically, MAD/0.6745 ≈ σ for normally distributed data).

Copilot · 2025-12-03T11:18:51Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/iqr_anomaly_detection.py

+        iqr = q3 - q1
+
+        # Clamp IQR to prevent over-sensitive detection when data has no spread
+        iqr = max(iqr, 1.0)


Magic number 1.0 used as a minimum threshold for IQR clamping. Consider defining this as a named constant (e.g., MIN_IQR_THRESHOLD) with documentation explaining why this minimum is necessary to prevent over-sensitive anomaly detection when data has no spread.

Copilot · 2025-12-03T11:18:51Z

src/sdk/python/rtdip_sdk/pipelines/anomaly_detection/spark/iqr_anomaly_detection.py

+        rolling_q1 = pdf["value"].rolling(self.window_size).quantile(0.25)
+        rolling_q3 = pdf["value"].rolling(self.window_size).quantile(0.75)
+        rolling_iqr = rolling_q3 - rolling_q1
+
+        # Clamp IQR to prevent over-sensitivity
+        rolling_iqr = rolling_iqr.apply(lambda x: max(x, 1.0))
+
+        # Compute rolling bounds
+        lower_bound = rolling_q1 - self.threshold * rolling_iqr
+        upper_bound = rolling_q3 + self.threshold * rolling_iqr
+
+        # Flag anomalies outside the rolling bounds
+        pdf["is_anomaly"] = (pdf["value"] < lower_bound) | (pdf["value"] > upper_bound)


Rolling window operations on lines 153-155 will produce NaN values for the first window_size-1 rows. These NaN values will propagate to the anomaly detection logic, potentially causing unexpected behavior. Consider handling NaN values explicitly (e.g., by using .dropna() or setting a minimum number of periods with min_periods parameter).

BelmontR and others added 8 commits November 27, 2025 10:51

Created anomaly detection directory and interface

37ea28c

Signed-off-by: Rene Jokiel <Rene.Jokiel@fau.de>

implemented MAD anomaly detection

51d3adf

Signed-off-by: Christoph Huy <christoph.huy@campus.tu-berlin.de>

added mad w/ rolling window; added visualization for both mad approaches

70b9176

Signed-off-by: Christoph Huy <christoph.huy@campus.tu-berlin.de>

Add IQR anomaly detection Visualization script

f08513a

Signed-off-by: Mehdi-kbz <141425685+Mehdi-kbz@users.noreply.github.com>

Add IQR anomaly detection

fddc73e

Signed-off-by: Mehdi-kbz <141425685+Mehdi-kbz@users.noreply.github.com>

Merge to acess the stl implementation to already use it for anomaly d…

c13336a

…etection

integrated stl into mad anomaly detection.

3149092

Signed-off-by: Christoph Huy <christoph.huy@campus.tu-berlin.de>

Implement tests for IQR anomaly detection

30e18c0

Added tests for IqrAnomalyDetection and IqrAnomalyDetectionRollingWindow using Pytest. Signed-off-by: Mehdi-kbz <141425685+Mehdi-kbz@users.noreply.github.com>

Copilot AI review requested due to automatic review settings December 3, 2025 11:09

Perimora merged commit 327124e into develop Dec 3, 2025
2 of 16 checks passed

Copilot started reviewing on behalf of Perimora December 3, 2025 11:09 View session

Copilot finished reviewing on behalf of Perimora December 3, 2025 11:13

Copilot AI reviewed Dec 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Anomaly detection#72

Anomaly detection#72
Perimora merged 8 commits intodevelopfrom
anomaly-detection

Perimora commented Dec 3, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Copilot AI Dec 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

	anomalies_pdf = pdf[pdf["is_anomaly"] == True].copy()
	anomalies_pdf = pdf[pdf["is_anomaly"]].copy()


		Returns only the detected anomalies.

		:param df: Spark DataFrame containing a numeric "value" column.

	:param df: Spark DataFrame containing a numeric "value" column.
	:param df: Spark DataFrame containing a numeric "value" column and a "timestamp" column.
	The "timestamp" column is required for sorting and rolling window calculations.

Conversation

Perimora commented Dec 3, 2025

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 3, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants