[DOC] User warning over sampling methods

#### Describe the issue linked to the documentation

There is some discussion going on about the usefulness of some (if not all) over / under sampling methods implemented in the imbalanced learn package. 

Typically there is some doubt about the usefulness of SMOTE:
- from researchers ([To SMOTE or not to SMOTE ?](https://arxiv.org/abs/2201.08528))
- from practitioners (see weekly discussion on Kaggle, Data Science stack exchange ... etc.)
- and even one of the authors of the package ([Learning from Imbalanced data: I was wrong but I was not the only one](https://www.youtube.com/watch?v=Po7PRIBjRoQ))

Basically it seems that:
- Methods do not improve ranking (think AUC)
- Methods do break probability calibration (ECE / calibration curve)

I think that it is a problem that those discussions are not more visible to the newcomers. (And that more experienced people need to have to deal with that on a weekly basis).

#### Suggest a potential alternative/fix

It would be nice to have 

1) a clearer demonstration in the doc, because for the moment only the usage is described:

```
from collections import Counter
from sklearn.datasets import make_classification
from imblearn.over_sampling import SMOTE
X, y = make_classification(n_classes=2, class_sep=2,
weights=[0.1, 0.9], n_informative=3, n_redundant=1, flip_y=0,
n_features=20, n_clusters_per_class=1, n_samples=1000, random_state=10)
print('Original dataset shape %s' % Counter(y))
sm = SMOTE(random_state=42)
X_res, y_res = sm.fit_resample(X, y)
print('Resampled dataset shape %s' % Counter(y_res))
```
It shows that it oversampled, but not that it works either in terms or ranking (AUC) / probability calibration (ECE / calibration curve). 

Could the doc be upgraded with a better exemple ?

2) a visible user warning regarding the discussions on usefulness of these methods. 

While (one of the) authors have changed its mind about the usefulness of these methods, it seems that a younger crowd is still very eager to jump on these shiny methods. I think it would be helpful for the DS community to make a clearer stance. 

I would suggest at least a very visible warning in the doc, like a red banner ('there are some discussion about the usefulness of these methods. See: XXX. Use with caution').

This could be expanded with a UserWarning... may be a bit brutal but it could prevent a lot of trouble. 

Edit: not sure why it added the good first issue automatically... but I'll take it.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOC] User warning over sampling methods #1101

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[DOC] User warning over sampling methods #1101

Description

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions