Open
Description
Is your feature request related to a problem? Please describe
By default, imblearn
can handle 2D data (samples, features). I often work with time series and also try to classify time series. As a result, an imbalance between the classes can also occur. But I can not use the imblearn package as time series are 3-dimensional (e.g. samples, features, sequence_length)
Describe the solution you'd like
I would like to have the option to also pass 3D time series data to the many applications imblearn
offers. Currently, I wrote, e.g., my own oversampler, which I present as the "alternatives section". This code can of course be reused by the authors of imblearn for the described enhancement.
Describe alternatives you've considered
def oversample(x_train, y_train):
slope_types = [x_train[y_train.to_numpy().flatten() == 0], x_train[y_train.to_numpy().flatten() == 1],
x_train[y_train.to_numpy().flatten() == 2], x_train[y_train.to_numpy().flatten() == 3],
x_train[y_train.to_numpy().flatten() == 4]]
majority_class_length = max([len(i) for i in slope_types])
oversampled_x_data = np.empty([1, x_train.shape[1], x_train.shape[2]])
oversampled_y_data = np.empty([1])
for slope_number, slope_data in enumerate(slope_types):
slope_data_length = len(slope_data)
while slope_data_length < majority_class_length:
idx = np.random.choice(np.arange(slope_data.shape[0]))
drawn_sample = slope_data[idx].reshape(1, slope_data.shape[1], slope_data.shape[2])
oversampled_x_data = np.concatenate((oversampled_x_data, drawn_sample), axis=0)
oversampled_y_data = np.concatenate((oversampled_y_data, np.array([slope_number])), axis=0)
slope_data_length += 1
oversampled_x_data = oversampled_x_data[1:]
oversampled_y_data = oversampled_y_data[1:]
x_train = np.concatenate((x_train, oversampled_x_data), axis=0)
y_train = pd.DataFrame(np.concatenate((y_train, oversampled_y_data.reshape(len(oversampled_y_data), 1)), axis=0), columns=['label'])
return x_train, y_train
Metadata
Metadata
Assignees
Labels
No labels