-
Notifications
You must be signed in to change notification settings - Fork 208
[BUG] Updated sbd_distance() to handle multivariate series (#2674) #2715
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Thank you for contributing to
|
I think the tests have to be modified to be consistent with the new sbd_distance calculation. The same approach is used in the original k-shapes paper at https://dl.acm.org/doi/pdf/10.1145/2723372.2737793. ![]() |
x = np.transpose(x, (1, 0)) | ||
y = np.transpose(y, (1, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You transpose the output in L270 again. Is this really necessary?
if x.shape[0] == 1 and y.shape[0] == 1: | ||
_x = x.ravel() | ||
_y = y.ravel() | ||
return _univariate_sbd_distance(_x, _y, standardize) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please keep this branch for now.
x = (x - np.mean(x)) / np.std(x) | ||
y = (y - np.mean(y)) / np.std(y) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After transposing, this is standardizing along the wrong axis, isn't it?
with objmode(cc="float64[:, :]"): | ||
cc = np.array( | ||
[ | ||
correlate(x[:, i], y[:, i], mode="full", method="fft") | ||
for i in range(x.shape[1]) | ||
] | ||
).T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Python loops are expensive. This is the reason, we use Numba a lot in the distance module. Is this loop really necessary? Maybe the correlate function can be used in a vectorized way?
sz = x.shape[0] | ||
cc = np.vstack((cc[-(sz - 1) :], cc[:sz])) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slicing with [:n]
should be equal to slicing with [:-1]
, which is simpler to understand, or are you working around a limitation in Numba? Then, please add a comment.
We should benchmark the new implementation against the existing one for univariate series. If it is faster, we could get rid of the existing code. Please also verify that the |
Reference Issues/PRs
Fixes #2674
What does this implement/fix? Explain your changes.
The earlier implementation found the average of the distance after calculating normalized_cc for each channel independenly.
The is replaced by a new _multivariate_sbd_distance() method which normalizes using the norm of the multivariate series, as is the case with tslearn and kshape-python.
Does your contribution introduce a new dependency? If yes, which one?
Nil
Any other comments?
Tests need to be modified. Now the values are consistent for tslearn and kshape-python
PR checklist
For all contributions
For new estimators and functions
__maintainer__
at the top of relevant files and want to be contacted regarding its maintenance. Unmaintained files may be removed. This is for the full file, and you should not add yourself if you are just making minor changes or do not want to help maintain its contents.For developers with write access