Skip to content

Commit 4131eaa

Browse files
committed
docs for custom eval metric (#470)
1 parent 8d2e976 commit 4131eaa

5 files changed

Lines changed: 147 additions & 1 deletion

File tree

README.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -214,6 +214,8 @@ All models are automatically saved to be able to restore the training after inte
214214
- for multiclass classification: `logloss`, `f1`, `accuracy` - default is `logloss`
215215
- for regression: `rmse`, `mse`, `mae`, `r2`, `mape`, `spearman`, `pearson` - default is `rmse`
216216

217+
You can also pass a custom Python function directly as `eval_metric`. See the docs for [Custom eval metric](https://supervised.mljar.com/features/custom-eval-metric/).
218+
217219
If you don't find the `eval_metric` that you need, please add a new issue. We will add it.
218220

219221

docs/docs/api.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ social:
77
# API documentation
88

99
If you are looking for how trained models are stored and reloaded, see [Save and Load models](features/save-and-load-models.md).
10+
If you need a user-defined evaluation function, see [Custom eval metric](features/custom-eval-metric.md).
1011

1112
## `AutoML` class
1213

Lines changed: 133 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,133 @@
1+
---
2+
description: How to use a custom evaluation metric in MLJAR AutoML by passing a Python function directly as eval_metric.
3+
social:
4+
cards_layout: default/variant
5+
---
6+
7+
# Custom eval metric
8+
9+
`mljar-supervised` supports custom evaluation metrics.
10+
11+
You can pass your own Python function directly as the `eval_metric` argument in `AutoML`.
12+
13+
## Basic usage
14+
15+
The function should have this interface:
16+
17+
```python
18+
def my_custom_metric(y_true, y_predicted, sample_weight=None):
19+
# compute score
20+
return score
21+
```
22+
23+
Then use it directly:
24+
25+
```python
26+
from supervised import AutoML
27+
28+
automl = AutoML(
29+
results_path="AutoML_custom_metric",
30+
eval_metric=my_custom_metric,
31+
)
32+
automl.fit(X, y)
33+
```
34+
35+
## Important rule: the metric must be minimized
36+
37+
Custom metrics in `mljar-supervised` are always treated as metrics to minimize.
38+
39+
This means:
40+
41+
- if lower is better, return the value directly
42+
- if higher is better, return its negative value
43+
44+
For example:
45+
46+
- MSE can be returned directly
47+
- precision, F1, or AUC should usually return `-value`
48+
49+
## Regression example
50+
51+
```python
52+
import numpy as np
53+
from supervised import AutoML
54+
55+
def custom_mse(y_true, y_predicted, sample_weight=None):
56+
y_true = np.asarray(y_true)
57+
y_predicted = np.asarray(y_predicted)
58+
return np.mean((y_true - y_predicted) ** 2)
59+
60+
automl = AutoML(
61+
results_path="AutoML_regression_custom_metric",
62+
eval_metric=custom_mse,
63+
)
64+
automl.fit(X, y)
65+
```
66+
67+
## Classification example
68+
69+
For classification, `y_predicted` can contain probabilities, so you may need to apply thresholding or `argmax` inside your metric.
70+
71+
```python
72+
import numpy as np
73+
from sklearn.metrics import precision_score
74+
from supervised import AutoML
75+
76+
def positive_class_precision(y_true, y_predicted, sample_weight=None):
77+
y_true = np.asarray(y_true)
78+
y_predicted = np.asarray(y_predicted)
79+
80+
if y_predicted.ndim == 2 and y_predicted.shape[1] == 1:
81+
y_predicted = y_predicted.ravel()
82+
83+
if y_predicted.ndim == 1:
84+
y_predicted = (y_predicted > 0.5).astype(int)
85+
else:
86+
y_predicted = np.argmax(y_predicted, axis=1)
87+
88+
value = precision_score(y_true, y_predicted, sample_weight=sample_weight)
89+
90+
# higher precision is better, so return negative value
91+
return -value
92+
93+
automl = AutoML(
94+
results_path="AutoML_classification_custom_metric",
95+
eval_metric=positive_class_precision,
96+
)
97+
automl.fit(X, y)
98+
```
99+
100+
## Notes
101+
102+
- the metric function must return a single numeric value
103+
- the metric should handle `sample_weight=None`
104+
- the metric will be used for early stopping and model selection
105+
- the metric should be deterministic and reasonably fast
106+
107+
## FAQ
108+
109+
### Can I pass a function directly?
110+
111+
Yes. This is the supported public interface:
112+
113+
```python
114+
automl = AutoML(eval_metric=my_custom_metric)
115+
```
116+
117+
### Should I pass `eval_metric="user_defined_metric"`?
118+
119+
No. That name is used internally. In user code, pass the function itself.
120+
121+
### Can I maximize my metric directly?
122+
123+
No. Convert it to a minimization target, usually by returning `-value`.
124+
125+
### Why do I need thresholding for some classification metrics?
126+
127+
Because many classification metrics such as precision or F1 expect class labels, while model predictions during evaluation can be probabilities.
128+
129+
## Related pages
130+
131+
- [AutoML API](../api.md)
132+
- [Save and Load models](save-and-load-models.md)
133+
- [Preprocessing](preprocessing.md)

docs/mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,7 @@ nav:
7272
- Get started: index.md
7373
- Features:
7474
- Apps: features/apps.md
75+
- Custom eval metric: features/custom-eval-metric.md
7576
- Preprocessing: features/preprocessing.md
7677
- Save and Load models: features/save-and-load-models.md
7778
- Steps of AutoML: features/automl.md

supervised/automl.py

Lines changed: 10 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -150,12 +150,21 @@ def __init__(
150150
151151
stack_models (boolean): Whether a models stack gets created at the end of the training. Stack level is 1.
152152
153-
eval_metric (str): The metric to be used in early stopping and to compare models.
153+
eval_metric (str or function): The metric to be used in early stopping and to compare models.
154154
155155
- for binary classification: `logloss`, `auc`, `f1`, `average_precision`, `accuracy` - default is logloss (if left "auto")
156156
- for mutliclass classification: `logloss`, `f1`, `accuracy` - default is `logloss` (if left "auto")
157157
- for regression: `rmse`, `mse`, `mae`, `r2`, `mape`, `spearman`, `pearson` - default is `rmse` (if left "auto")
158158
159+
You can also pass a custom Python function directly. The expected interface is:
160+
161+
`def my_metric(y_true, y_predicted, sample_weight=None): return score`
162+
163+
The returned value is always minimized. If you want to maximize a metric,
164+
for example precision or F1, return its negative value. For classification
165+
tasks, `y_predicted` can contain probabilities, so thresholding or `argmax`
166+
might be needed inside the custom metric.
167+
159168
validation_strategy (dict): Dictionary with validation type. Right now train/test split and cross-validation are supported.
160169
161170
Example:

0 commit comments

Comments
 (0)