You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: vignettes/_hyperparameter-tuning.Rmd
+28-28Lines changed: 28 additions & 28 deletions
Original file line number
Diff line number
Diff line change
@@ -45,7 +45,7 @@ This is a discretized function from latitude--longitude pairs ($\mathbb{S}^2$, a
45
45
The {ripserr} package ports the Cubical Ripser algorithm to R and serves as the engine for this pre-processing step.
46
46
In the interest of identifying peaks and ridges rather than troughs and valleys, here is the _superlevel_ set persistence diagram for this image based on negated elevations:
Following a standard ML approach, we prepare the training set for 6-fold cross-validation, which will be used to choose hyperparameter settings that maximize the accuracy of a classifier.
84
84
The optimized settings will be used to classify the digits in the testing set, and a comparison with the true labels will provide an estimate of the accuracy of the resulting model.
85
85
86
-
```{r datapartition}
86
+
```{r data-partition}
87
87
(mnist_folds <- vfold_cv(mnist_train, v = 6))
88
88
```
89
89
90
90
For reference, we check the range of the values that populate the arrays:
91
91
92
-
```{r valuerange}
92
+
```{r value-range}
93
93
print(range(unlist(mnist_train$digit)))
94
94
```
95
95
@@ -116,7 +116,7 @@ For this reason, the PH step by default assigns the placeholder role `"persisten
116
116
In contrast, the blur step modifies its column `digit` in place, and [the modified column inherits the role(s) of the original](https://recipes.tidymodels.org/articles/Roles.html#role-inheritance).
117
117
This requires us to update the role of `digit`, but this can be done before or after applying the blur.
118
118
119
-
```{r persistenthomologicalrecipe}
119
+
```{r persistent-homological-recipe}
120
120
recipe(mnist_train) |>
121
121
update_role(label, new_role = "outcome") |>
122
122
update_role(digit, new_role = "image") |>
@@ -133,7 +133,7 @@ recipe(mnist_train) |>
133
133
134
134
We can check directly whether our role assignments obtained as expected:
135
135
136
-
```{r recipesummary}
136
+
```{r recipe-summary}
137
137
mnist_rec |>
138
138
prep() |>
139
139
print() -> mnist_prep
@@ -142,7 +142,7 @@ summary(mnist_prep)
142
142
143
143
And we can inspect the result of applying the prepared recipe to the training data:
144
144
145
-
```{r trainedrecipe}
145
+
```{r trained-recipe}
146
146
mnist_rec |>
147
147
prep() |>
148
148
bake(new_data = mnist_train)
@@ -162,7 +162,7 @@ The {randomForest} engine is strict in its requirements: If all predictors have
162
162
This is a serious possibility for our setting, for instance if no degree-1 features are detected so that the persistence landscapes are uniformly zero.
163
163
To prevent this from derailing our workflow, we use the {ranger} engine, which is tolerant of this situation, instead.
164
164
165
-
```{r randomforestspecification}
165
+
```{r random-forest-specification}
166
166
rand_forest(
167
167
trees = 300,
168
168
min_n = 6,
@@ -177,7 +177,7 @@ rand_forest(
177
177
178
178
We fit the model to the pre-processed training data:
179
179
180
-
```{r modelfit}
180
+
```{r model-fit}
181
181
fit(
182
182
mnist_spec,
183
183
mnist_rec |> prep() |> formula(),
@@ -188,14 +188,14 @@ fit(
188
188
189
189
Note that the model, while not optimized for accuracy on the training set, is informed by it through the preparation process, in particular the default choice of blur parameter:
190
190
191
-
```{r blurvalue}
191
+
```{r blur-value}
192
192
mnist_prep$steps[[1]]$blur_sigmas
193
193
```
194
194
195
195
To evaluate the model, we generate predictions for the testing set and compare them to the true labels.
196
196
Because accuracy is a coarse metric for a 10-value classification task, we examine the confusion matrix to get a sense of what errors are most common.
@@ -219,7 +219,7 @@ We rewrite the recipe to prepare the hyperparameters for tuning rather than to a
219
219
Each parameter is given a character ID that will refer to it in the various system messages and outputs.
220
220
Beware that this section of the vignette overwrites all `mnist_*` variable names used in the previous section! This is for readability but can cause problems if parts are executed out of order.
221
221
222
-
```{r tunablepersistenthomologicalrecipe}
222
+
```{r tunable-persistent-homological-recipe}
223
223
recipe(mnist_train) |>
224
224
update_role(label, new_role = "outcome") |>
225
225
update_role(digit, new_role = "image") |>
@@ -236,15 +236,15 @@ recipe(mnist_train) |>
236
236
237
237
This recipe has three tunable parameters, as we can verify by extracting their dials:
Note that all three require finalization; like the number of randomly sampled predictors for each tree in a random forest, their ranges should not be guessed but must be determined from the content of the data.
244
244
In fact, the persistence landscape hyperparameters must be determined from columns derived by the first two steps from the input columns, not from the input columns themselves.
245
245
For this reason only, we train the recipe with some intuitive values in order to obtain these derived columns for tuning purposes:
@@ -257,7 +257,7 @@ We use the input `digit` column to finalize the range of blurs and the derived `
257
257
We expect the important topological features of the digits number at most 2 per image, so we manually prescribe a narrow range for `num_levels`, though it could also be learned from the training set.
@@ -269,7 +269,7 @@ Note that the homological degree ranges only from $0$ to $1$ because the image h
269
269
270
270
As noted earlier, the three hyperparameters of the RF specification will be treated in different ways: `trees` fixed at $300$, `min_n` tuned over a default grid, and `mtry` tuned over a grid learned from the training set.
271
271
272
-
```{r tunablerandomforestspecification}
272
+
```{r tunable-random-forest-specification}
273
273
rand_forest(
274
274
trees = 300,
275
275
min_n = tune("rf_node"),
@@ -282,15 +282,15 @@ rand_forest(
282
282
283
283
We can again check for the two tunable hyperparameters by extracting their dials:
As seen in the printed dials, one hyperparameter range must be finalized.
290
290
The process for doing so is the same as for the recipe, but in this case the variables of interest are the vectorized features.
291
291
We see from the pre-processed data that these features use a consistent naming convention, and we use this convention to select them for learning the range:
@@ -299,7 +299,7 @@ We see from the pre-processed data that these features use a consistent naming c
299
299
At last we arrive at the crux of this vignette!
300
300
In preparation for tuning, we wrap the pre-processing recipe and the model specification in a workflow:
301
301
302
-
```{r workflow of recipeandmodel}
302
+
```{r workflow-of-recipe-and-model}
303
303
workflow() |>
304
304
add_recipe(mnist_rec) |>
305
305
add_model(mnist_spec) |>
@@ -310,7 +310,7 @@ One way to systematically optimize the recipe and model hyperparameters---to tun
310
310
This can be important when the parameters have known trade-offs or the objective function is expected to have multiple optima, so the investigator wants a course-grained "map" of how performance varies across the whole parameter space.
311
311
For illustration, we construct a workflow grid by crossing a recipe grid with a model grid:
@@ -325,7 +325,7 @@ This approach is not as inefficient as it might seem:
325
325
As implemented in Tidymodels, [grid tuning recognizes the order of the pre-processing steps](https://tune.tidymodels.org/articles/extras/optimizations.html), and sometimes [of the model construction](https://www.tidymodels.org/learn/work/tune-text/#grid-search), and only performs each step once for every combination of subsequent steps, using the temporarily stored results rather than recomputing them from scratch.
326
326
The unexecuted code chunk below shows the syntax for conducting this grid search:
327
327
328
-
```{r tuneworkflowover a grid, eval=FALSE}
328
+
```{r tune-workflow-over-a-grid, eval=FALSE}
329
329
mnist_res <- tune_grid(
330
330
mnist_wflow,
331
331
resamples = mnist_folds,
@@ -339,7 +339,7 @@ Still, however, due to the exceptionally costly computations involved, a more ta
339
339
For this reason, [Motta, Tralie, &al (2019)](https://ieeexplore.ieee.org/abstract/document/8999182) recommend Bayesian optimization for ML using topological features. This procedure is also implemented in Tidymodels and executed below.
340
340
First, we combine the extracted dials from the pre-processing recipe and the model specification and update them with the finalized ranges:
The tuning syntax is similar to that of the grid search, but in place of the grid we only provide the updated dials.
349
349
We specify 6 initial seeds with 12 tuning iterations each.
350
350
351
-
```{r tuneworkflowviabayesianoptimization}
351
+
```{r tune-workflow-via-bayesian-optimization}
352
352
mnist_res <- tune_bayes(
353
353
mnist_wflow,
354
354
resamples = mnist_folds,
@@ -361,15 +361,15 @@ mnist_res <- tune_bayes(
361
361
Whereas in the previous section we fit only a single model, in this section we have obtained results for numerous hyperparameter settings and must choose among them for the final model.
362
362
We first plot the parameter paths, for a sense of how each was tuned:
363
363
364
-
```{r parameterplot of tuningresults}
364
+
```{r parameter-plot-of-tuning-results}
365
365
autoplot(mnist_res, type = "parameters")
366
366
```
367
367
368
368
In this experiment, the pre-processing parameters quickly converged, whereas the model parameters were more difficult to optimize.
369
369
Better results were obtained with less blur, lower-degree features, and more landscape levels.
370
370
We also visualize the distribution of performance metrics at each iteration, to see what improvement each step incurred:
371
371
372
-
```{r performanceplot of tuningresults}
372
+
```{r performance-plot-of-tuning-results}
373
373
autoplot(mnist_res, type = "performance")
374
374
```
375
375
@@ -379,7 +379,7 @@ We can also see that the dips at some iterations may have been associated with d
379
379
We can now select settings from the top-performing models for the final model.
380
380
We computed both accuracy and area under the ROC curve, but we use only the former to inform this choice.
381
381
382
-
```{r selecthyperparametersettings}
382
+
```{r select-hyperparameter-settings}
383
383
collect_metrics(mnist_res) |>
384
384
filter(.metric == "accuracy") |>
385
385
arrange(desc(mean))
@@ -390,7 +390,7 @@ collect_metrics(mnist_res) |>
390
390
391
391
Now, finally, we fit the tuned model to the training set:
0 commit comments