Skip to content

Commit bbf1d4e

Browse files
committed
[ci skip] MAINT Replace R^2 with MAE in exercise M3.02 (#830) e62b23c
1 parent 3b9b5e5 commit bbf1d4e

File tree

7 files changed

+96
-56
lines changed

7 files changed

+96
-56
lines changed

.buildinfo

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,4 @@
11
# Sphinx build info version 1
22
# This file hashes the configuration used when building these files. When it is not found, a full rebuild will be done.
3-
config: 804f24a7d6da2bb21a214583fed15567
3+
config: 08c157dd73b0a8da763c180621a73e08
44
tags: 645f666f9bcd5a90fca523b33c5a78b7

_sources/python_scripts/parameter_tuning_ex_03.py

+8-2
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,9 @@
4040
# Write your code here.
4141

4242
# %% [markdown]
43-
# Use `RandomizedSearchCV` with `n_iter=20` to find the best set of
44-
# hyperparameters by tuning the following parameters of the `model`:
43+
# Use `RandomizedSearchCV` with `n_iter=20` and
44+
# `scoring="neg_mean_absolute_error"` to tune the following hyperparameters
45+
# of the `model`:
4546
#
4647
# - the parameter `n_neighbors` of the `KNeighborsRegressor` with values
4748
# `np.logspace(0, 3, num=10).astype(np.int32)`;
@@ -50,6 +51,11 @@
5051
# - the parameter `with_std` of the `StandardScaler` with possible values `True`
5152
# or `False`.
5253
#
54+
# The `scoring` function is expected to return higher values for better models,
55+
# since grid/random search objects **maximize** it. Because of that, error
56+
# metrics like `mean_absolute_error` must be negated (using the `neg_` prefix)
57+
# to work correctly (remember lower errors represent better models).
58+
#
5359
# Notice that in the notebook "Hyperparameter tuning by randomized-search" we
5460
# pass distributions to be sampled by the `RandomizedSearchCV`. In this case we
5561
# define a fixed grid of hyperparameters to be explored. Using a `GridSearchCV`

_sources/python_scripts/parameter_tuning_sol_03.py

+18-4
Original file line numberDiff line numberDiff line change
@@ -40,8 +40,9 @@
4040
model = make_pipeline(scaler, KNeighborsRegressor())
4141

4242
# %% [markdown]
43-
# Use `RandomizedSearchCV` with `n_iter=20` to find the best set of
44-
# hyperparameters by tuning the following parameters of the `model`:
43+
# Use `RandomizedSearchCV` with `n_iter=20` and
44+
# `scoring="neg_mean_absolute_error"` to tune the following hyperparameters
45+
# of the `model`:
4546
#
4647
# - the parameter `n_neighbors` of the `KNeighborsRegressor` with values
4748
# `np.logspace(0, 3, num=10).astype(np.int32)`;
@@ -50,6 +51,11 @@
5051
# - the parameter `with_std` of the `StandardScaler` with possible values `True`
5152
# or `False`.
5253
#
54+
# The `scoring` function is expected to return higher values for better models,
55+
# since grid/random search objects **maximize** it. Because of that, error
56+
# metrics like `mean_absolute_error` must be negated (using the `neg_` prefix)
57+
# to work correctly (remember lower errors represent better models).
58+
#
5359
# Notice that in the notebook "Hyperparameter tuning by randomized-search" we
5460
# pass distributions to be sampled by the `RandomizedSearchCV`. In this case we
5561
# define a fixed grid of hyperparameters to be explored. Using a `GridSearchCV`
@@ -79,6 +85,7 @@
7985
model_random_search = RandomizedSearchCV(
8086
model,
8187
param_distributions=param_distributions,
88+
scoring="neg_mean_absolute_error",
8289
n_iter=20,
8390
n_jobs=2,
8491
verbose=1,
@@ -107,6 +114,13 @@
107114

108115
cv_results = pd.DataFrame(model_random_search.cv_results_)
109116

117+
# %% [markdown] tags=["solution"]
118+
# As we used `neg_mean_absolute_error` as score metric, we should multiply the
119+
# score results with minus 1 to get mean absolute error values:
120+
121+
# %% tags=["solution"]
122+
cv_results["mean_test_score"] *= -1
123+
110124
# %% [markdown] tags=["solution"]
111125
# To simplify the axis of the plot, we rename the column of the dataframe and
112126
# only select the mean test score and the value of the hyperparameters.
@@ -121,7 +135,7 @@
121135

122136
cv_results = cv_results.rename(columns=column_name_mapping)
123137
cv_results = cv_results[column_name_mapping.values()].sort_values(
124-
"mean test score", ascending=False
138+
"mean test score"
125139
)
126140

127141
# %% [markdown] tags=["solution"]
@@ -153,7 +167,7 @@
153167
# holding on any axis of the parallel coordinate plot. You can then slide (move)
154168
# the range selection and cross two selections to see the intersections.
155169
#
156-
# Selecting the best performing models (i.e. above R2 score of ~0.68), we
170+
# Selecting the best performing models (i.e. below MEA score of ~47 k$), we
157171
# observe that **in this case**:
158172
#
159173
# - scaling the data is important. All the best performing models use scaled

appendix/notebook_timings.html

+2-2
Original file line numberDiff line numberDiff line change
@@ -1175,9 +1175,9 @@ <h1>Notebook timings<a class="headerlink" href="#notebook-timings" title="Link t
11751175
<td><p></p></td>
11761176
</tr>
11771177
<tr class="row-odd"><td><p><a class="xref doc reference internal" href="../python_scripts/parameter_tuning_sol_03.html"><span class="doc">python_scripts/parameter_tuning_sol_03</span></a></p></td>
1178-
<td><p>2025-04-01 09:09</p></td>
1178+
<td><p>2025-04-22 15:13</p></td>
11791179
<td><p>cache</p></td>
1180-
<td><p>19.48</p></td>
1180+
<td><p>21.24</p></td>
11811181
<td><p></p></td>
11821182
</tr>
11831183
<tr class="row-even"><td><p><a class="xref doc reference internal" href="../python_scripts/trees_classification.html"><span class="doc">python_scripts/trees_classification</span></a></p></td>

python_scripts/parameter_tuning_ex_03.html

+7-2
Original file line numberDiff line numberDiff line change
@@ -728,8 +728,9 @@ <h1>📝 Exercise M3.02<a class="headerlink" href="#exercise-m3-02" title="Link
728728
</div>
729729
</div>
730730
</div>
731-
<p>Use <code class="docutils literal notranslate"><span class="pre">RandomizedSearchCV</span></code> with <code class="docutils literal notranslate"><span class="pre">n_iter=20</span></code> to find the best set of
732-
hyperparameters by tuning the following parameters of the <code class="docutils literal notranslate"><span class="pre">model</span></code>:</p>
731+
<p>Use <code class="docutils literal notranslate"><span class="pre">RandomizedSearchCV</span></code> with <code class="docutils literal notranslate"><span class="pre">n_iter=20</span></code> and
732+
<code class="docutils literal notranslate"><span class="pre">scoring=&quot;neg_mean_absolute_error&quot;</span></code> to tune the following hyperparameters
733+
of the <code class="docutils literal notranslate"><span class="pre">model</span></code>:</p>
733734
<ul class="simple">
734735
<li><p>the parameter <code class="docutils literal notranslate"><span class="pre">n_neighbors</span></code> of the <code class="docutils literal notranslate"><span class="pre">KNeighborsRegressor</span></code> with values
735736
<code class="docutils literal notranslate"><span class="pre">np.logspace(0,</span> <span class="pre">3,</span> <span class="pre">num=10).astype(np.int32)</span></code>;</p></li>
@@ -738,6 +739,10 @@ <h1>📝 Exercise M3.02<a class="headerlink" href="#exercise-m3-02" title="Link
738739
<li><p>the parameter <code class="docutils literal notranslate"><span class="pre">with_std</span></code> of the <code class="docutils literal notranslate"><span class="pre">StandardScaler</span></code> with possible values <code class="docutils literal notranslate"><span class="pre">True</span></code>
739740
or <code class="docutils literal notranslate"><span class="pre">False</span></code>.</p></li>
740741
</ul>
742+
<p>The <code class="docutils literal notranslate"><span class="pre">scoring</span></code> function is expected to return higher values for better models,
743+
since grid/random search objects <strong>maximize</strong> it. Because of that, error
744+
metrics like <code class="docutils literal notranslate"><span class="pre">mean_absolute_error</span></code> must be negated (using the <code class="docutils literal notranslate"><span class="pre">neg_</span></code> prefix)
745+
to work correctly (remember lower errors represent better models).</p>
741746
<p>Notice that in the notebook “Hyperparameter tuning by randomized-search” we
742747
pass distributions to be sampled by the <code class="docutils literal notranslate"><span class="pre">RandomizedSearchCV</span></code>. In this case we
743748
define a fixed grid of hyperparameters to be explored. Using a <code class="docutils literal notranslate"><span class="pre">GridSearchCV</span></code>

python_scripts/parameter_tuning_sol_03.html

+59-44
Large diffs are not rendered by default.

searchindex.js

+1-1
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)