Skip to content

[ENH] add a difference transformer to series transformations #2729

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .all-contributorsrc
Original file line number Diff line number Diff line change
Expand Up @@ -2683,6 +2683,16 @@
"contributions": [
"code"
]
},
{
"login": "TinaJin0228",
"name": "Tina Jin",
"avatar_url": "https://avatars.githubusercontent.com/TinaJin0228",
"profile": "https://github.com/TinaJin0228",
"contributions": [
"code",
"doc"
]
}
],
"commitType": "docs"
Expand Down
2 changes: 2 additions & 0 deletions aeon/transformations/series/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@
"SIVSeriesTransformer",
"PCASeriesTransformer",
"WarpingSeriesTransformer",
"DifferenceTransformer",
]

from aeon.transformations.series._acf import (
Expand All @@ -32,6 +33,7 @@
from aeon.transformations.series._boxcox import BoxCoxTransformer
from aeon.transformations.series._clasp import ClaSPTransformer
from aeon.transformations.series._dft import DFTSeriesTransformer
from aeon.transformations.series._diff import DifferenceTransformer
from aeon.transformations.series._dobin import Dobin
from aeon.transformations.series._exp_smoothing import ExpSmoothingSeriesTransformer
from aeon.transformations.series._gauss import GaussSeriesTransformer
Expand Down
115 changes: 115 additions & 0 deletions aeon/transformations/series/_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,115 @@
import numpy as np

from aeon.transformations.series.base import BaseSeriesTransformer

__maintainer__ = ["Tina Jin"]
__all__ = ["DifferenceTransformer"]


class DifferenceTransformer(BaseSeriesTransformer):
"""
Calculates the n-th order difference of a time series.

Transforms a time series X into a series Y representing the difference
calculated `order` times.
- Order 1: Y[t] = X[t] - X[t-1]
- Order 2: Y[t] = (X[t] - X[t-1]) - (X[t-1] - X[t-2]) = X[t] - 2*X[t-1] + X[t-2]
- ... and so on.

The first `order` element(s) of the transformed series along the time axis
will be NaN, so that the output series will have the same shape as the input series.

Parameters
----------
order : int, default=1
The order of differencing. Must be a positive integer.

axis : int, default=1
The axis along which the difference is computed. Assumed to be the
time axis.
If `axis == 0`, assumes shape `(n_timepoints, n_channels)`.
If `axis == 1`, assumes shape `(n_channels, n_timepoints)`.

Notes
-----
This transformer assumes the input series does not contain NaN values where
the difference needs to be computed.

Examples
--------
>>> import numpy as np
>>> from aeon.transformations.series._diff import DifferenceTransformer
>>> X1 = np.array([[1, 3, 2, 5, 4, 7, 6, 9, 8, 10]])
>>> dt = DifferenceTransformer()
>>> Xt1 = dt.fit_transform(X1)
>>> print(Xt1)
[[nan 2. -1. 3. -1. 3. -1. 3. -1. 2.]]

>>> X2 = np.array([[1, 3, 2, 5, 4, 7, 6, 9, 8, 10]])
>>> dt2 = DifferenceTransformer(order=2)
>>> Xt2 = dt2.fit_transform(X2)
>>> print(Xt2)
[[nan nan -3. 4. -4. 4. -4. 4. -4. 3.]]

>>> X3 = np.array([[1, 2, 3, 4, 5], [5, 4, 3, 2, 1]])
>>> dt = DifferenceTransformer()
>>> Xt3 = dt.fit_transform(X3)
>>> print(Xt3)
[[nan 1. 1. 1. 1.]
[nan -1. -1. -1. -1.]]

>>> X4 = np.array([[1, 5], [2, 4], [3, 3], [4, 2], [5, 1]])
>>> dt_axis0 = DifferenceTransformer(axis=0)
>>> Xt4 = dt_axis0.fit_transform(X4, axis=0)
>>> print(Xt4)
[[nan nan]
[ 1. -1.]
[ 1. -1.]
[ 1. -1.]
[ 1. -1.]]
"""

_tags = {
"capability:multivariate": True,
"X_inner_type": "np.ndarray",
"fit_is_empty": True,
}

def __init__(self, order=1, axis=1):
if not isinstance(order, int) or order < 1:
raise ValueError(f"`order` must be a positive integer, but got {order}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do this in fit

self.order = order
super().__init__(axis=axis)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the axis parameter. This is used to determine the shape of the time series internally. We only want to apply this to series, not between channels.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The "axis" is inherited from BaseSeriesTransformer. Should "axis = 1" be used to indicate that the time series are all in rows, with shape (n_channels, n_timepoints)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, thats telling the base class to convert the series to (n_channels, n_timepoints) before passing it to _fit and other functions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

got it


def _transform(self, X, y=None):
"""
Perform the n-th order differencing transformation.

Parameters
----------
X : np.ndarray

y : ignored argument for interface compatibility

Returns
-------
Xt : np.ndarray
Transformed version of X with the same shape, containing the
n-th order difference.
The first `order` elements along the time axis are NaN.
"""
diff_X = np.diff(X, n=self.order, axis=self.axis)

# Check if diff_X is integer type.
# If so, cast to float to allow inserting np.nan.
if not np.issubdtype(diff_X.dtype, np.floating):
diff_X = diff_X.astype(np.float64)

# Insert the NaN at the beginning
nan_shape = list(X.shape)
nan_shape[self.axis] = self.order
nans_to_prepend = np.full(nan_shape, np.nan, dtype=np.float64)

Xt = np.concatenate([nans_to_prepend, diff_X], axis=self.axis)

return Xt
28 changes: 28 additions & 0 deletions aeon/transformations/series/tests/test_diff.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
"""Tests for Difference transformation."""

import numpy as np

from aeon.transformations.series._diff import DifferenceTransformer


def test_diff():
"""Tests basic first and second order differencing."""
X = np.array([[1.0, 4.0, 9.0, 16.0, 25.0, 36.0]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you also test multivariate series in another test perhaps


dt1 = DifferenceTransformer(order=1)
Xt1 = dt1.fit_transform(X)
expected1 = np.array([[np.nan, 3.0, 5.0, 7.0, 9.0, 11.0]])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO better to return a smaller series than include NaNs. Possibly a parameter if you think it is worth it but by default change the shape.


assert Xt1.shape == X.shape, "Shape mismatch for order 1"
np.testing.assert_allclose(
Xt1, expected1, equal_nan=True, err_msg="Value mismatch for order 1"
)

dt2 = DifferenceTransformer(order=2)
Xt2 = dt2.fit_transform(X)
expected2 = np.array([[np.nan, np.nan, 2.0, 2.0, 2.0, 2.0]])

assert Xt2.shape == X.shape, "Shape mismatch for order 2"
np.testing.assert_allclose(
Xt2, expected2, equal_nan=True, err_msg="Value mismatch for order 2"
)