Skip to content

Commit 26769aa

Browse files
committed
rp spaces with hyphens in code chunk names to make image files portable
1 parent 1f60a20 commit 26769aa

8 files changed

Lines changed: 107 additions & 109 deletions

vignettes/_hyperparameter-tuning.Rmd

Lines changed: 28 additions & 28 deletions
Original file line numberDiff line numberDiff line change
@@ -45,7 +45,7 @@ This is a discretized function from latitude--longitude pairs ($\mathbb{S}^2$, a
4545
The {ripserr} package ports the Cubical Ripser algorithm to R and serves as the engine for this pre-processing step.
4646
In the interest of identifying peaks and ridges rather than troughs and valleys, here is the _superlevel_ set persistence diagram for this image based on negated elevations:
4747

48-
```{r volcano persistence}
48+
```{r volcano-persistence}
4949
volcano_pd <- ripserr::cubical(-volcano)
5050
TDA::plot.diagram(volcano_pd, asp = 1)
5151
```
@@ -83,13 +83,13 @@ mnist_test$label <- factor(mnist_test$label, sort(unique(mnist_test$label)))
8383
Following a standard ML approach, we prepare the training set for 6-fold cross-validation, which will be used to choose hyperparameter settings that maximize the accuracy of a classifier.
8484
The optimized settings will be used to classify the digits in the testing set, and a comparison with the true labels will provide an estimate of the accuracy of the resulting model.
8585

86-
```{r data partition}
86+
```{r data-partition}
8787
(mnist_folds <- vfold_cv(mnist_train, v = 6))
8888
```
8989

9090
For reference, we check the range of the values that populate the arrays:
9191

92-
```{r value range}
92+
```{r value-range}
9393
print(range(unlist(mnist_train$digit)))
9494
```
9595

@@ -116,7 +116,7 @@ For this reason, the PH step by default assigns the placeholder role `"persisten
116116
In contrast, the blur step modifies its column `digit` in place, and [the modified column inherits the role(s) of the original](https://recipes.tidymodels.org/articles/Roles.html#role-inheritance).
117117
This requires us to update the role of `digit`, but this can be done before or after applying the blur.
118118

119-
```{r persistent homological recipe}
119+
```{r persistent-homological-recipe}
120120
recipe(mnist_train) |>
121121
update_role(label, new_role = "outcome") |>
122122
update_role(digit, new_role = "image") |>
@@ -133,7 +133,7 @@ recipe(mnist_train) |>
133133

134134
We can check directly whether our role assignments obtained as expected:
135135

136-
```{r recipe summary}
136+
```{r recipe-summary}
137137
mnist_rec |>
138138
prep() |>
139139
print() -> mnist_prep
@@ -142,7 +142,7 @@ summary(mnist_prep)
142142

143143
And we can inspect the result of applying the prepared recipe to the training data:
144144

145-
```{r trained recipe}
145+
```{r trained-recipe}
146146
mnist_rec |>
147147
prep() |>
148148
bake(new_data = mnist_train)
@@ -162,7 +162,7 @@ The {randomForest} engine is strict in its requirements: If all predictors have
162162
This is a serious possibility for our setting, for instance if no degree-1 features are detected so that the persistence landscapes are uniformly zero.
163163
To prevent this from derailing our workflow, we use the {ranger} engine, which is tolerant of this situation, instead.
164164

165-
```{r random forest specification}
165+
```{r random-forest-specification}
166166
rand_forest(
167167
trees = 300,
168168
min_n = 6,
@@ -177,7 +177,7 @@ rand_forest(
177177

178178
We fit the model to the pre-processed training data:
179179

180-
```{r model fit}
180+
```{r model-fit}
181181
fit(
182182
mnist_spec,
183183
mnist_rec |> prep() |> formula(),
@@ -188,14 +188,14 @@ fit(
188188

189189
Note that the model, while not optimized for accuracy on the training set, is informed by it through the preparation process, in particular the default choice of blur parameter:
190190

191-
```{r blur value}
191+
```{r blur-value}
192192
mnist_prep$steps[[1]]$blur_sigmas
193193
```
194194

195195
To evaluate the model, we generate predictions for the testing set and compare them to the true labels.
196196
Because accuracy is a coarse metric for a 10-value classification task, we examine the confusion matrix to get a sense of what errors are most common.
197197

198-
```{r evaluate fitted model}
198+
```{r evaluate-fitted-model}
199199
mnist_fit |>
200200
predict(new_data = bake(prep(mnist_rec), new_data = mnist_test)) |>
201201
bind_cols(select(mnist_test, label)) |>
@@ -219,7 +219,7 @@ We rewrite the recipe to prepare the hyperparameters for tuning rather than to a
219219
Each parameter is given a character ID that will refer to it in the various system messages and outputs.
220220
Beware that this section of the vignette overwrites all `mnist_*` variable names used in the previous section! This is for readability but can cause problems if parts are executed out of order.
221221

222-
```{r tunable persistent homological recipe}
222+
```{r tunable-persistent-homological-recipe}
223223
recipe(mnist_train) |>
224224
update_role(label, new_role = "outcome") |>
225225
update_role(digit, new_role = "image") |>
@@ -236,15 +236,15 @@ recipe(mnist_train) |>
236236

237237
This recipe has three tunable parameters, as we can verify by extracting their dials:
238238

239-
```{r tunable recipe hyperparameters}
239+
```{r tunable-recipe-hyperparameters}
240240
( rec_dials <- extract_parameter_set_dials(mnist_rec) )
241241
```
242242

243243
Note that all three require finalization; like the number of randomly sampled predictors for each tree in a random forest, their ranges should not be guessed but must be determined from the content of the data.
244244
In fact, the persistence landscape hyperparameters must be determined from columns derived by the first two steps from the input columns, not from the input columns themselves.
245245
For this reason only, we train the recipe with some intuitive values in order to obtain these derived columns for tuning purposes:
246246

247-
```{r trained tunable recipe}
247+
```{r trained-tunable-recipe}
248248
mnist_rec |>
249249
finalize_recipe(parameters = list(blur_sd = 8, pl_deg = 0, pl_lev = 3)) |>
250250
prep() |>
@@ -257,7 +257,7 @@ We use the input `digit` column to finalize the range of blurs and the derived `
257257
We expect the important topological features of the digits number at most 2 per image, so we manually prescribe a narrow range for `num_levels`, though it could also be learned from the training set.
258258
Each finalized range is printed for reference:
259259

260-
```{r finalize recipe tuners}
260+
```{r finalize-recipe-tuners}
261261
( blur_sd_fin <- finalize(blur_sigmas(), mnist_train |> select(digit)) )
262262
( pl_deg_fin <- finalize(hom_degree(), mnist_bake |> select(digit)) )
263263
( pl_lev_fin <- num_levels(range = c(1, 12)) )
@@ -269,7 +269,7 @@ Note that the homological degree ranges only from $0$ to $1$ because the image h
269269

270270
As noted earlier, the three hyperparameters of the RF specification will be treated in different ways: `trees` fixed at $300$, `min_n` tuned over a default grid, and `mtry` tuned over a grid learned from the training set.
271271

272-
```{r tunable random forest specification}
272+
```{r tunable-random-forest-specification}
273273
rand_forest(
274274
trees = 300,
275275
min_n = tune("rf_node"),
@@ -282,15 +282,15 @@ rand_forest(
282282

283283
We can again check for the two tunable hyperparameters by extracting their dials:
284284

285-
```{r tunable model parameters}
285+
```{r tunable-model-parameters}
286286
( spec_dials <- extract_parameter_set_dials(mnist_spec) )
287287
```
288288

289289
As seen in the printed dials, one hyperparameter range must be finalized.
290290
The process for doing so is the same as for the recipe, but in this case the variables of interest are the vectorized features.
291291
We see from the pre-processed data that these features use a consistent naming convention, and we use this convention to select them for learning the range:
292292

293-
```{r finalize model tuners}
293+
```{r finalize-model-tuners}
294294
( mtry_fin <- finalize(mtry(), mnist_bake |> select(contains("_pl_"))) )
295295
```
296296

@@ -299,7 +299,7 @@ We see from the pre-processed data that these features use a consistent naming c
299299
At last we arrive at the crux of this vignette!
300300
In preparation for tuning, we wrap the pre-processing recipe and the model specification in a workflow:
301301

302-
```{r workflow of recipe and model}
302+
```{r workflow-of-recipe-and-model}
303303
workflow() |>
304304
add_recipe(mnist_rec) |>
305305
add_model(mnist_spec) |>
@@ -310,7 +310,7 @@ One way to systematically optimize the recipe and model hyperparameters---to tun
310310
This can be important when the parameters have known trade-offs or the objective function is expected to have multiple optima, so the investigator wants a course-grained "map" of how performance varies across the whole parameter space.
311311
For illustration, we construct a workflow grid by crossing a recipe grid with a model grid:
312312

313-
```{r prepare a tuning grid}
313+
```{r prepare-a-tuning-grid}
314314
mnist_rec_grid <-
315315
grid_regular(blur_sd_fin, pl_deg_fin, pl_lev_fin, levels = 3) |>
316316
set_names(c("blur_sd", "pl_deg", "pl_lev"))
@@ -325,7 +325,7 @@ This approach is not as inefficient as it might seem:
325325
As implemented in Tidymodels, [grid tuning recognizes the order of the pre-processing steps](https://tune.tidymodels.org/articles/extras/optimizations.html), and sometimes [of the model construction](https://www.tidymodels.org/learn/work/tune-text/#grid-search), and only performs each step once for every combination of subsequent steps, using the temporarily stored results rather than recomputing them from scratch.
326326
The unexecuted code chunk below shows the syntax for conducting this grid search:
327327

328-
```{r tune workflow over a grid, eval=FALSE}
328+
```{r tune-workflow-over-a-grid, eval=FALSE}
329329
mnist_res <- tune_grid(
330330
mnist_wflow,
331331
resamples = mnist_folds,
@@ -339,7 +339,7 @@ Still, however, due to the exceptionally costly computations involved, a more ta
339339
For this reason, [Motta, Tralie, &al (2019)](https://ieeexplore.ieee.org/abstract/document/8999182) recommend Bayesian optimization for ML using topological features. This procedure is also implemented in Tidymodels and executed below.
340340
First, we combine the extracted dials from the pre-processing recipe and the model specification and update them with the finalized ranges:
341341

342-
```{r update parameter dials}
342+
```{r update-parameter-dials}
343343
wflow_dials <- bind_rows(rec_dials, spec_dials) |>
344344
update(blur_sd = blur_sd_fin, pl_deg = pl_deg_fin, pl_lev = pl_lev_fin) |>
345345
update(rf_pred = mtry_fin)
@@ -348,7 +348,7 @@ wflow_dials <- bind_rows(rec_dials, spec_dials) |>
348348
The tuning syntax is similar to that of the grid search, but in place of the grid we only provide the updated dials.
349349
We specify 6 initial seeds with 12 tuning iterations each.
350350

351-
```{r tune workflow via bayesian optimization}
351+
```{r tune-workflow-via-bayesian-optimization}
352352
mnist_res <- tune_bayes(
353353
mnist_wflow,
354354
resamples = mnist_folds,
@@ -361,15 +361,15 @@ mnist_res <- tune_bayes(
361361
Whereas in the previous section we fit only a single model, in this section we have obtained results for numerous hyperparameter settings and must choose among them for the final model.
362362
We first plot the parameter paths, for a sense of how each was tuned:
363363

364-
```{r parameter plot of tuning results}
364+
```{r parameter-plot-of-tuning-results}
365365
autoplot(mnist_res, type = "parameters")
366366
```
367367

368368
In this experiment, the pre-processing parameters quickly converged, whereas the model parameters were more difficult to optimize.
369369
Better results were obtained with less blur, lower-degree features, and more landscape levels.
370370
We also visualize the distribution of performance metrics at each iteration, to see what improvement each step incurred:
371371

372-
```{r performance plot of tuning results}
372+
```{r performance-plot-of-tuning-results}
373373
autoplot(mnist_res, type = "performance")
374374
```
375375

@@ -379,7 +379,7 @@ We can also see that the dips at some iterations may have been associated with d
379379
We can now select settings from the top-performing models for the final model.
380380
We computed both accuracy and area under the ROC curve, but we use only the former to inform this choice.
381381

382-
```{r select hyperparameter settings}
382+
```{r select-hyperparameter-settings}
383383
collect_metrics(mnist_res) |>
384384
filter(.metric == "accuracy") |>
385385
arrange(desc(mean))
@@ -390,7 +390,7 @@ collect_metrics(mnist_res) |>
390390

391391
Now, finally, we fit the tuned model to the training set:
392392

393-
```{r final model fit}
393+
```{r final-model-fit}
394394
mnist_fin <- prep(finalize_recipe(mnist_rec, mnist_best))
395395
fit(
396396
finalize_model(mnist_spec, mnist_best),
@@ -404,7 +404,7 @@ The resulting RF correctly classifies roughly half of the images "out of bag"---
404404
(Note also that the `blur` standard deviation is lower than originally defaulted to.)
405405
As before, we can look to the confusion matrix for insights into the sources of error:
406406

407-
```{r evaluate on training set}
407+
```{r evaluate-on-training-set}
408408
mnist_fit |>
409409
predict(new_data = bake(mnist_fin, new_data = mnist_train)) |>
410410
bind_cols(select(mnist_train, label)) |>
@@ -419,7 +419,7 @@ Among the most-confused digits in previous runs of these experiments are $1$ and
419419

420420
Finally, we evaluate the final model on the testing set---the holdout data from the original partition.
421421

422-
```{r evaluate on testing set}
422+
```{r evaluate-on-testing-set}
423423
mnist_fit |>
424424
predict(new_data = bake(mnist_fin, new_data = mnist_test)) |>
425425
bind_cols(select(mnist_test, label)) |>

0 commit comments

Comments
 (0)