Is there have parameter of fit to simulate the 'pos_weight' argument in scikit-learn ? #544

HuangChiEn · 2022-05-10T01:06:38Z

HuangChiEn
May 10, 2022

I have seen the related issue of this topic(#27), but I wonder that is there have other parameters to simulate the 'pos_weight' arguments in scikit-learn?

Since it only needs you to pass the 0/1 label to indicate the data distribution which has the most data sample, instead of creating a vector as long as the data sample to indicate the weight one by one. (Although it seems we still can implement the same function, pos_weight will be helpful to deal with data imbalance if we only work on binary classification)

Any suggestion will be appreciated!!

sonichi · 2022-05-11T14:15:06Z

sonichi
May 11, 2022

Where is the pos_weight argument in scikit-learn? I couldn't find it in its documentation.

0 replies

HuangChiEn · 2022-05-20T01:12:07Z

HuangChiEn
May 20, 2022
Author

Sorry, the description seems not precise. I copy the reply from gitter.

09:07
To be specific, the argument 'scale_pos_weight' in xgboost (https://xgboost.readthedocs.io/en/stable/parameter.html) and 'class_weight' in scikit-learn (https://scikit-learn.org/stable/modules/generated/sklearn.utils.class_weight.compute_class_weight.html).

Such args simply add the weight (sum(negative instances) / sum(positive instances)) for the positive sample, in order to assign the greater penalty while the positive sample (fewer sample) is miss-classified.

0 replies

liususan091219 · 2022-05-20T16:14:02Z

liususan091219
May 20, 2022

@HuangChiEn you can set the 'scale_pos_weight' ('class_weight) to your own value using the "custom_hp" argument, by setting it to a dictionary like below, let me know if this works for you:

from flaml import AutoML
from pandas.tests.extension.conftest import as_frame
from sklearn.datasets import load_iris

X_train, y_train = load_iris(return_X_y=True, as_frame=as_frame)
automl = AutoML()
automl_settings = {
    "time_budget": 2,
    "task": "classification",
    "log_file_name": "test/iris.log",
    "estimator_list": ["xgboost", "rf"],
    "max_iter": 2,
}

automl_settings["custom_hp"] = {
    "xgboost": {
        "scale_pos_weight": {
            "domain": 0.5,
        }
    },
    "rf": {
        "class_weight": {
            "domain": "balanced"
        }
    }
}
automl.fit(X_train, y_train, **automl_settings)

4 replies

HuangChiEn May 23, 2022
Author

Thanks for your reply, it's a simple way to fulfill the weighting of the dataset ~

HuangChiEn May 24, 2022
Author

"scale_pos_weight": {
            "domain": 0.5,

The update may be postponed, seems this new feature have not been added in the package yet.

"custom_hp"

This morning, i encounter the issue as illustrated as follows while i feed the "custom_hp":

I wonder could i pass the argument for the initilizer of xgboost estimator wrapped in the flaml ?

HuangChiEn May 24, 2022
Author

Hello~ I just wonder does the maintainer have some proposal to add the new argument which allows the user to pass the argument to the initial method of the estimator?

It may not be the good design in the original Xgboost library which config the "scale_pos_weight" argument in "the initial method", instead of the "the fit method", since the scale_pos_weight is related with the data sample.
So, the limitation of hp_param is rational, customizing the fit argument is enough to deal with the general case.

I have solve this issue in the hacker way (I think it's really not suitable) by overwrite the customized estimator :

class Custom_xgboost(XGBoostSklearnEstimator):
    '''XGBoostSklearnEstimator with a customized search space'''
    # add gpu support..
    def __init__(self, **config): 
        other_config = { ... }
        super().__init__(**config, **other_config)

    @classmethod
    def search_space(cls, data_size, **params):
        ...
    # overwrite the fit method to pass the scale_pos_weight argument..
    def fit(self, X_train, y_train, budget=None, **kwargs):
        if kwargs.get("ds_wei"):
            self.params["scale_pos_weight"] = kwargs.pop("ds_wei")
        # i have trace the code to BaseEstimator, the self.params dictionary is used to 
        # initialize the estimator (s.t. xgboost), and we just add the args in it with 
        # the key-value, but it's still recommend to replace it by the official way 
        super().fit(X_train, y_train, budget=None, **kwargs)

Although it solve my problem, it's still worthy to wait for the official method to support such functionaliaty.

sonichi May 24, 2022

"scale_pos_weight": {
            "domain": 0.5,
The update may be postponed, seems this new feature have not been added in the package yet.
"custom_hp"
This morning, i encounter the issue as illustrated as follows while i feed the "custom_hp":

I wonder could i pass the argument for the initilizer of xgboost estimator wrapped in the flaml ?

This feature is new in version 1.0.2. Could you please try that version?

sonichi · 2022-05-23T14:16:42Z

sonichi
May 23, 2022

@HuangChiEn @liususan091219 could one of you update the FAQ about how to handle imbalanced data?

2 replies

HuangChiEn May 24, 2022
Author

Yes, we may update the page (fork&PR) if this issue can be solved.

sonichi Jun 12, 2022

@HuangChiEn Has your issue been resolved? Any further questions?

Is there have parameter of fit to simulate the 'pos_weight' argument in scikit-learn ? #544

Uh oh!

Uh oh!

HuangChiEn May 10, 2022

Replies: 4 comments · 6 replies

Uh oh!

sonichi May 11, 2022

Uh oh!

HuangChiEn May 20, 2022 Author

Uh oh!

Uh oh!

liususan091219 May 20, 2022

Uh oh!

HuangChiEn May 23, 2022 Author

Uh oh!

Uh oh!

HuangChiEn May 24, 2022 Author

Uh oh!

HuangChiEn May 24, 2022 Author

Uh oh!

sonichi May 24, 2022

Uh oh!

sonichi May 23, 2022

Uh oh!

HuangChiEn May 24, 2022 Author

Uh oh!

sonichi Jun 12, 2022

HuangChiEn
May 10, 2022

Replies: 4 comments 6 replies

sonichi
May 11, 2022

HuangChiEn
May 20, 2022
Author

liususan091219
May 20, 2022

HuangChiEn May 23, 2022
Author

HuangChiEn May 24, 2022
Author

HuangChiEn May 24, 2022
Author

sonichi
May 23, 2022

HuangChiEn May 24, 2022
Author