Skip to content

Commit 771396a

Browse files
committed
Adding f1 beta tuning examples to the docs
1 parent 03f4633 commit 771396a

File tree

1 file changed

+122
-0
lines changed

1 file changed

+122
-0
lines changed

source/usage_guide.rst

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1960,6 +1960,128 @@ See :ref:`this section <model_calibration>` for more information on model calibr
19601960
<div style="height: 50px;"></div>
19611961

19621962

1963+
F1 Beta Threshold Tuning
1964+
------------------------
1965+
In binary classification, selecting an optimal classification threshold is crucial for
1966+
achieving the best balance between precision and recall. The default threshold of 0.5
1967+
may not always yield optimal performance, especially when dealing with imbalanced
1968+
datasets. F1 Beta threshold tuning helps adjust this threshold to maximize the F-beta
1969+
score, which balances precision and recall according to the importance assigned to each
1970+
through the beta parameter.
1971+
1972+
Understanding F1 Beta Score
1973+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1974+
1975+
The F-beta score is a generalization of the F1-score that allows you to weigh recall
1976+
more heavily than precision (or vice versa) based on the beta parameter:
1977+
1978+
- F1-Score (beta = 1): Equal importance to precision and recall.
1979+
1980+
- F-beta > 1: More emphasis on recall. Useful when false negatives are more critical (e.g., disease detection).
1981+
1982+
- F-beta < 1: More emphasis on precision. Suitable when false positives are costlier (e.g., spam detection).
1983+
1984+
Example usage: default (beta = 1)
1985+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
1986+
1987+
Setting up the Model object ready for tuning.
1988+
1989+
.. code-block:: python
1990+
1991+
from xgboost import XGBClassifier
1992+
1993+
xgb_name = "xgb"
1994+
xgb = XGBClassifier(
1995+
objective="binary:logistic"
1996+
random_state=222,
1997+
)
1998+
1999+
tuned_parameters_xgb = {
2000+
f"{xgb_name}__max_depth": [3, 10, 20, 200, 500],
2001+
f"{xgb_name}__learning_rate": [1e-4],
2002+
f"{xgb_name}__n_estimators": [1000],
2003+
f"{xgb_name}__early_stopping_rounds": [100],
2004+
f"{xgb_name}__verbose": [0],
2005+
f"{xgb_name}__eval_metric": ["logloss"],
2006+
}
2007+
2008+
xgb_model = Model(
2009+
name=f"Threshold Example Model",
2010+
estimator_name=xgb_name,
2011+
calibrate=False,
2012+
model_type="classification",
2013+
estimator=xgb,
2014+
kfold=False,
2015+
stratify_y=True,
2016+
stratify_cols=False,
2017+
grid=tuned_parameters_xgb,
2018+
randomized_grid=False,
2019+
boost_early=False,
2020+
scoring=["roc_auc"],
2021+
random_state=222,
2022+
n_jobs=2,
2023+
)
2024+
2025+
2026+
In the grid_search_param_tuning use the f1_beta_tune variable when using
2027+
grid_search_param_tuning(). Set this to True to enable tuning.
2028+
2029+
.. code-block:: python
2030+
2031+
xgb_model.grid_search_param_tuning(X, y, f1_beta_tune=True)
2032+
2033+
This will find the best hyperparameters and then find the best threshold for these
2034+
balancing both precision and recall. The threshold is stored in the Model object.
2035+
To access this:
2036+
2037+
.. code-block:: python
2038+
2039+
xgb_model.threshold
2040+
2041+
This will give the best threshold found for each score specified in the Model object.
2042+
2043+
When using methods to return metrics or report metrics and an optimal threshold was used
2044+
make sure to remember to specify this in them for example:
2045+
2046+
.. code-block:: python
2047+
2048+
xgb_model.return_metrics(
2049+
X_valid,
2050+
y_valid,
2051+
optimal_threshold=True,
2052+
print_threshold=True,
2053+
model_metrics=True,
2054+
)
2055+
2056+
2057+
Example usage: custom betas (higher recall)
2058+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2059+
If we want to have a higher recall score and care less about precision then we increase
2060+
the beta value. This looks very similar to the previous example except that when we
2061+
use f1_beta_tune, we also set a beta value like so:
2062+
2063+
2064+
.. code-block:: python
2065+
2066+
xgb_model.grid_search_param_tuning(X, y, f1_beta_tune=True, betas=[2])
2067+
2068+
Setting the beta value to 2 will priortise increasing the recall over the precision.
2069+
2070+
2071+
2072+
Example usage: custom betas (higher precision)
2073+
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
2074+
If we want to have a higher precision score and care less about recall then we decrease
2075+
the beta value. This looks very similar to the previous example except that we set the
2076+
beta value to less than 1.
2077+
2078+
2079+
.. code-block:: python
2080+
2081+
xgb_model.grid_search_param_tuning(X, y, f1_beta_tune=True, betas=[0.5])
2082+
2083+
Setting the beta value to 0.5 will priortise increasing the precision over the recall.
2084+
19632085
Imbalanced Learning
19642086
------------------------
19652087

0 commit comments

Comments
 (0)