[ENH] Support unequal length time series with TSFresh

### Describe the feature or idea you want to propose

The `TSFresh` transformer and time series methods that use it do not support unequal-length time series, but according to the first question in the [`tsfresh` FAQ](https://tsfresh.readthedocs.io/en/latest/text/faq.html) it should be capable of doing so.

### Describe your proposed solution

Modifying the function below to accept lists of 2D numpy arrays in addition to 3D numpy arrays:

https://github.com/aeon-toolkit/aeon/blob/fc614aa3e32d7638bf5dbd7b4075aa3bd12fb52d/aeon/transformations/collection/feature_based/_tsfresh.py#L13-L29

My attempt was incredibly slow on 3D numpy arrays:

```python
def _from_3d_list_to_long(list_):
    def _convert_case_to_long_df(case, index):
        df = (
            pd.DataFrame(case)
            .transpose()
            .melt(var_name="column", ignore_index=False)
            .reset_index()
            .rename(columns={"index": "time_index"})
        )

        df["index"] = np.repeat(index, len(df))

        return df

    long_dfs = Parallel()(
        delayed(_convert_case_to_long_df)(case, index)
        for index, case in enumerate(list_)
    )

    combined_dfs = pd.concat(
        long_dfs,
        ignore_index=True,
    )
    combined_dfs = combined_dfs[["index", "time_index", "column", "value"]]
    combined_dfs["column"] = "dim_" + combined_dfs["column"].astype(str)

    return combined_dfs
```

I asked Gemini to speed it up, and this was the result:

```python

def _from_3d_list_to_long_optimized(list_):
    def _convert_case_to_long_df(case, index):
        # NOTE: This is the optimized version using .T and .stack()
        df = pd.DataFrame(case).T
        df.columns = df.columns.map(lambda i: "dim_" + str(i))

        df_long = df.stack().reset_index()
        df_long.columns = ["time_index", "column", "value"]
        df_long["index"] = index
        return df_long

    # Keeping Parallel() as it is necessary for variable length lists
    long_dfs = Parallel()(  # Use n_jobs=-1 to use all cores
        delayed(_convert_case_to_long_df)(case, index)
        for index, case in enumerate(list_)
    )

    # pd.concat is still required to combine the results
    combined_dfs = pd.concat(
        long_dfs,
        ignore_index=True,
    )

    # Final column ordering (column renaming is now done inside the loop)
    combined_dfs = combined_dfs[["index", "time_index", "column", "value"]]

    return combined_dfs
```

Perhaps there is an even better way of implementing it,.
Or if the final speed cannot match the original function, a check could be included to determine which function to use based on the input type.

### Describe alternatives you've considered, if relevant

_No response_

### Additional context

_No response_

	def _from_3d_numpy_to_long(arr):
	# Converting the numpy array to a long format DataFrame
	n_cases, n_channels, n_timepoints = arr.shape

	# Creating a DataFrame from the numpy array with multi-level index
	df = pd.DataFrame(arr.reshape(n_cases * n_channels, n_timepoints))
	df["case_index"] = np.repeat(np.arange(n_cases), n_channels)
	df["dimension"] = np.tile(np.arange(n_channels), n_cases)
	df = df.melt(
	id_vars=["case_index", "dimension"], var_name="time_index", value_name="value"
	)

	# Adjusting the column order and renaming columns
	df = df[["case_index", "time_index", "dimension", "value"]]
	df = df.rename(columns={"case_index": "index", "dimension": "column"})
	df["column"] = "dim_" + df["column"].astype(str)
	return df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[ENH] Support unequal length time series with TSFresh #3179

Describe the feature or idea you want to propose

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[ENH] Support unequal length time series with TSFresh #3179

Description

Describe the feature or idea you want to propose

Describe your proposed solution

Describe alternatives you've considered, if relevant

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions