Skip to content

[BUG] Inconsistent Sbd distance with tslearn and other implementations #2674

@tanishy7777

Description

@tanishy7777

Describe the bug

While implementing K-shape clustering(#2661) I noticed that our implementation for the sbd_distance differs slightly from how tslearn handles it.

This doesn't matter for univariate series but for multivariate series we do see differences.

sbd_distance: finds the distance for each channel independently and then takes its average
normalized_cc: finds the correlations for each of the channels and then sums the max of each channels, and then normalizes using the norm of the the entire multivariate series.

I found another implementation of kshape online, one of its contributors is the original author of the kshapes paper (https://dl.acm.org/doi/pdf/10.1145/2723372.2737793). They handle this the same way as tslearn. https://github.com/TheDatumOrg/kshape-python

I am not sure if this is an intentional choice but this changes the final clustering output.
If we use aeon's sbd distances for k-shape clustering we will have to change the test cases to reflect the same, but then we wouldn't have something to compare it against.

Steps/Code to reproduce the bug

import numpy as np
from aeon.distances import sbd_distance
from tslearn.metrics.cycc import normalized_cc

# Multivariate time series
np.random.seed(1)
x = np.random.rand(6, 100)
y = np.random.rand(6, 100)
a = sbd_distance(x, y, standardize=False)
print(x.shape, y.shape)

z = np.transpose(x, (1, 0))
w = np.transpose(y, (1, 0))
b = 1.0 - normalized_cc(z, w).max()
print(a, b)


# Univariate time series
np.random.seed(1)
x = np.random.rand(1, 200)
y = np.random.rand(1, 200)
print(x.shape, y.shape)

a = sbd_distance(x, y, standardize=False)

x = np.transpose(x, (1, 0))
y = np.transpose(y, (1, 0))

b = 1.0 - normalized_cc(x, y).max()
print(a, b)

Expected results

Expected result is that sbd distance computed for both univariate and multivariate time series are consistent with tslearn and https://github.com/TheDatumOrg/kshape-python

Actual results

1st example is multivariate time series which is inconsistent with tslearn and https://github.com/TheDatumOrg/kshape-python
Image

Versions

System:
python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:06:27) [MSC v.1942 64 bit (AMD64)]
executable: D:\Open Source\aeon\aeon-venv\Scripts\python.exe
machine: Windows-11-10.0.22631-SP0
Python dependencies:
aeon: 1.0.0
pip: 24.3.1
setuptools: 75.8.0
scikit-learn: 1.5.2
numpy: 1.26.4
numba: 0.60.0
scipy: 1.14.1
pandas: 2.2.3

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingclusteringClustering packagedistancesDistances package

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions