Description
Describe the bug
While implementing K-shape clustering(#2661) I noticed that our implementation for the sbd_distance
differs slightly from how tslearn
handles it.
This doesn't matter for univariate series but for multivariate series we do see differences.
sbd_distance: finds the distance for each channel independently and then takes its average
normalized_cc: finds the correlations for each of the channels and then sums the max of each channels, and then normalizes using the norm of the the entire multivariate series.
I found another implementation of kshape online, one of its contributors is the original author of the kshapes paper (https://dl.acm.org/doi/pdf/10.1145/2723372.2737793). They handle this the same way as tslearn
. https://github.com/TheDatumOrg/kshape-python
I am not sure if this is an intentional choice but this changes the final clustering output.
If we use aeon's sbd distances for k-shape clustering we will have to change the test cases to reflect the same, but then we wouldn't have something to compare it against.
Steps/Code to reproduce the bug
import numpy as np
from aeon.distances import sbd_distance
from tslearn.metrics.cycc import normalized_cc
# Multivariate time series
np.random.seed(1)
x = np.random.rand(6, 100)
y = np.random.rand(6, 100)
a = sbd_distance(x, y, standardize=False)
print(x.shape, y.shape)
z = np.transpose(x, (1, 0))
w = np.transpose(y, (1, 0))
b = 1.0 - normalized_cc(z, w).max()
print(a, b)
# Univariate time series
np.random.seed(1)
x = np.random.rand(1, 200)
y = np.random.rand(1, 200)
print(x.shape, y.shape)
a = sbd_distance(x, y, standardize=False)
x = np.transpose(x, (1, 0))
y = np.transpose(y, (1, 0))
b = 1.0 - normalized_cc(x, y).max()
print(a, b)
Expected results
Expected result is that sbd distance computed for both univariate and multivariate time series are consistent with tslearn
and https://github.com/TheDatumOrg/kshape-python
Actual results
1st example is multivariate time series which is inconsistent with tslearn
and https://github.com/TheDatumOrg/kshape-python
Versions
System:
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:06:27) [MSC v.1942 64 bit (AMD64)]
executable: D:\Open Source\aeon\aeon-venv\Scripts\python.exe
machine: Windows-11-10.0.22631-SP0
Python dependencies:
aeon: 1.0.0
pip: 24.3.1
setuptools: 75.8.0
scikit-learn: 1.5.2
numpy: 1.26.4
numba: 0.60.0
scipy: 1.14.1
pandas: 2.2.3