Is there have parameter of fit to simulate the 'pos_weight' argument in scikit-learn ? #544
Replies: 4 comments 6 replies
-
Where is the pos_weight argument in scikit-learn? I couldn't find it in its documentation. |
Beta Was this translation helpful? Give feedback.
-
Sorry, the description seems not precise. I copy the reply from gitter. 09:07 Such args simply add the weight (sum(negative instances) / sum(positive instances)) for the positive sample, in order to assign the greater penalty while the positive sample (fewer sample) is miss-classified. |
Beta Was this translation helpful? Give feedback.
-
@HuangChiEn you can set the 'scale_pos_weight' ('class_weight) to your own value using the "custom_hp" argument, by setting it to a dictionary like below, let me know if this works for you: from flaml import AutoML
from pandas.tests.extension.conftest import as_frame
from sklearn.datasets import load_iris
X_train, y_train = load_iris(return_X_y=True, as_frame=as_frame)
automl = AutoML()
automl_settings = {
"time_budget": 2,
"task": "classification",
"log_file_name": "test/iris.log",
"estimator_list": ["xgboost", "rf"],
"max_iter": 2,
}
automl_settings["custom_hp"] = {
"xgboost": {
"scale_pos_weight": {
"domain": 0.5,
}
},
"rf": {
"class_weight": {
"domain": "balanced"
}
}
}
automl.fit(X_train, y_train, **automl_settings) |
Beta Was this translation helpful? Give feedback.
-
@HuangChiEn @liususan091219 could one of you update the FAQ about how to handle imbalanced data? |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have seen the related issue of this topic(#27), but I wonder that is there have other parameters to simulate the 'pos_weight' arguments in scikit-learn?
Since it only needs you to pass the 0/1 label to indicate the data distribution which has the most data sample, instead of creating a vector as long as the data sample to indicate the weight one by one. (Although it seems we still can implement the same function, pos_weight will be helpful to deal with data imbalance if we only work on binary classification)
Any suggestion will be appreciated!!
Beta Was this translation helpful? Give feedback.
All reactions