@@ -1960,6 +1960,128 @@ See :ref:`this section <model_calibration>` for more information on model calibr
19601960 <div style =" height : 50px ;" ></div >
19611961
19621962
1963+ F1 Beta Threshold Tuning
1964+ ------------------------
1965+ In binary classification, selecting an optimal classification threshold is crucial for
1966+ achieving the best balance between precision and recall. The default threshold of 0.5
1967+ may not always yield optimal performance, especially when dealing with imbalanced
1968+ datasets. F1 Beta threshold tuning helps adjust this threshold to maximize the F-beta
1969+ score, which balances precision and recall according to the importance assigned to each
1970+ through the beta parameter.
1971+
1972+ Understanding F1 Beta Score
1973+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1974+
1975+ The F-beta score is a generalization of the F1-score that allows you to weigh recall
1976+ more heavily than precision (or vice versa) based on the beta parameter:
1977+
1978+ - F1-Score (beta = 1): Equal importance to precision and recall.
1979+
1980+ - F-beta > 1: More emphasis on recall. Useful when false negatives are more critical (e.g., disease detection).
1981+
1982+ - F-beta < 1: More emphasis on precision. Suitable when false positives are costlier (e.g., spam detection).
1983+
1984+ Example usage: default (beta = 1)
1985+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1986+
1987+ Setting up the Model object ready for tuning.
1988+
1989+ .. code-block :: python
1990+
1991+ from xgboost import XGBClassifier
1992+
1993+ xgb_name = " xgb"
1994+ xgb = XGBClassifier(
1995+ objective = " binary:logistic"
1996+ random_state = 222 ,
1997+ )
1998+
1999+ tuned_parameters_xgb = {
2000+ f " { xgb_name} __max_depth " : [3 , 10 , 20 , 200 , 500 ],
2001+ f " { xgb_name} __learning_rate " : [1e-4 ],
2002+ f " { xgb_name} __n_estimators " : [1000 ],
2003+ f " { xgb_name} __early_stopping_rounds " : [100 ],
2004+ f " { xgb_name} __verbose " : [0 ],
2005+ f " { xgb_name} __eval_metric " : [" logloss" ],
2006+ }
2007+
2008+ xgb_model = Model(
2009+ name = f " Threshold Example Model " ,
2010+ estimator_name = xgb_name,
2011+ calibrate = False ,
2012+ model_type = " classification" ,
2013+ estimator = xgb,
2014+ kfold = False ,
2015+ stratify_y = True ,
2016+ stratify_cols = False ,
2017+ grid = tuned_parameters_xgb,
2018+ randomized_grid = False ,
2019+ boost_early = False ,
2020+ scoring = [" roc_auc" ],
2021+ random_state = 222 ,
2022+ n_jobs = 2 ,
2023+ )
2024+
2025+
2026+ In the grid_search_param_tuning use the f1_beta_tune variable when using
2027+ grid_search_param_tuning(). Set this to True to enable tuning.
2028+
2029+ .. code-block :: python
2030+
2031+ xgb_model.grid_search_param_tuning(X, y, f1_beta_tune = True )
2032+
2033+ This will find the best hyperparameters and then find the best threshold for these
2034+ balancing both precision and recall. The threshold is stored in the Model object.
2035+ To access this:
2036+
2037+ .. code-block :: python
2038+
2039+ xgb_model.threshold
2040+
2041+ This will give the best threshold found for each score specified in the Model object.
2042+
2043+ When using methods to return metrics or report metrics and an optimal threshold was used
2044+ make sure to remember to specify this in them for example:
2045+
2046+ .. code-block :: python
2047+
2048+ xgb_model.return_metrics(
2049+ X_valid,
2050+ y_valid,
2051+ optimal_threshold = True ,
2052+ print_threshold = True ,
2053+ model_metrics = True ,
2054+ )
2055+
2056+
2057+ Example usage: custom betas (higher recall)
2058+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2059+ If we want to have a higher recall score and care less about precision then we increase
2060+ the beta value. This looks very similar to the previous example except that when we
2061+ use f1_beta_tune, we also set a beta value like so:
2062+
2063+
2064+ .. code-block :: python
2065+
2066+ xgb_model.grid_search_param_tuning(X, y, f1_beta_tune = True , betas = [2 ])
2067+
2068+ Setting the beta value to 2 will priortise increasing the recall over the precision.
2069+
2070+
2071+
2072+ Example usage: custom betas (higher precision)
2073+ ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2074+ If we want to have a higher precision score and care less about recall then we decrease
2075+ the beta value. This looks very similar to the previous example except that we set the
2076+ beta value to less than 1.
2077+
2078+
2079+ .. code-block :: python
2080+
2081+ xgb_model.grid_search_param_tuning(X, y, f1_beta_tune = True , betas = [0.5 ])
2082+
2083+ Setting the beta value to 0.5 will priortise increasing the precision over the recall.
2084+
19632085Imbalanced Learning
19642086------------------------
19652087
0 commit comments