-
Notifications
You must be signed in to change notification settings - Fork 257
Open
Labels
enhancementNew feature, improvement request or other non-bug code enhancementNew feature, improvement request or other non-bug code enhancementtransformationsTransformations packageTransformations package
Description
Describe the feature or idea you want to propose
The TSFresh transformer and time series methods that use it do not support unequal-length time series, but according to the first question in the tsfresh FAQ it should be capable of doing so.
Describe your proposed solution
Modifying the function below to accept lists of 2D numpy arrays in addition to 3D numpy arrays:
aeon/aeon/transformations/collection/feature_based/_tsfresh.py
Lines 13 to 29 in fc614aa
| def _from_3d_numpy_to_long(arr): | |
| # Converting the numpy array to a long format DataFrame | |
| n_cases, n_channels, n_timepoints = arr.shape | |
| # Creating a DataFrame from the numpy array with multi-level index | |
| df = pd.DataFrame(arr.reshape(n_cases * n_channels, n_timepoints)) | |
| df["case_index"] = np.repeat(np.arange(n_cases), n_channels) | |
| df["dimension"] = np.tile(np.arange(n_channels), n_cases) | |
| df = df.melt( | |
| id_vars=["case_index", "dimension"], var_name="time_index", value_name="value" | |
| ) | |
| # Adjusting the column order and renaming columns | |
| df = df[["case_index", "time_index", "dimension", "value"]] | |
| df = df.rename(columns={"case_index": "index", "dimension": "column"}) | |
| df["column"] = "dim_" + df["column"].astype(str) | |
| return df |
My attempt was incredibly slow on 3D numpy arrays:
def _from_3d_list_to_long(list_):
def _convert_case_to_long_df(case, index):
df = (
pd.DataFrame(case)
.transpose()
.melt(var_name="column", ignore_index=False)
.reset_index()
.rename(columns={"index": "time_index"})
)
df["index"] = np.repeat(index, len(df))
return df
long_dfs = Parallel()(
delayed(_convert_case_to_long_df)(case, index)
for index, case in enumerate(list_)
)
combined_dfs = pd.concat(
long_dfs,
ignore_index=True,
)
combined_dfs = combined_dfs[["index", "time_index", "column", "value"]]
combined_dfs["column"] = "dim_" + combined_dfs["column"].astype(str)
return combined_dfsI asked Gemini to speed it up, and this was the result:
def _from_3d_list_to_long_optimized(list_):
def _convert_case_to_long_df(case, index):
# NOTE: This is the optimized version using .T and .stack()
df = pd.DataFrame(case).T
df.columns = df.columns.map(lambda i: "dim_" + str(i))
df_long = df.stack().reset_index()
df_long.columns = ["time_index", "column", "value"]
df_long["index"] = index
return df_long
# Keeping Parallel() as it is necessary for variable length lists
long_dfs = Parallel()( # Use n_jobs=-1 to use all cores
delayed(_convert_case_to_long_df)(case, index)
for index, case in enumerate(list_)
)
# pd.concat is still required to combine the results
combined_dfs = pd.concat(
long_dfs,
ignore_index=True,
)
# Final column ordering (column renaming is now done inside the loop)
combined_dfs = combined_dfs[["index", "time_index", "column", "value"]]
return combined_dfsPerhaps there is an even better way of implementing it,.
Or if the final speed cannot match the original function, a check could be included to determine which function to use based on the input type.
Describe alternatives you've considered, if relevant
No response
Additional context
No response
SebastianSchmidl
Metadata
Metadata
Assignees
Labels
enhancementNew feature, improvement request or other non-bug code enhancementNew feature, improvement request or other non-bug code enhancementtransformationsTransformations packageTransformations package