kuenm2/vignettes/model_predictions.Rmd at 4dd5971b391847059b8a64831c2aab5f36a23e55 · wevertonbio/kuenm2 · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
---
title: "5. Project Models to a Single Scenario"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{5. Project Models to a Single Scenario}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

```{r, include = FALSE}
knitr::opts_chunk$set(
  collapse = TRUE,
  comment = "#>",
  fig.width = 7,
  fig.height = 6
)
```

# Summary

- [Description](#description)
- [Getting ready](#getting-ready)
- [Fitted models](#fitted-models)
- [Model predictions](#model-predictions)
  - [Predict to SpatRaster](#predict-to-spatraster)
  - [Predict to data.frame](#predict-to-data.frame)
- [Options for predictions](#options-for-predictions)
  - [Output type](#output-type)
  - [Clamping variables](#clamping-variables)
  - [No extrapolation](#no-extrapolation)
- [Binarize predictions](#binarize-predictions)
- [Saving Predictions](#saving-predictions)
- [Detecting changes between single scenarios](#detecting-changes-between-single-scenarios)

<hr>

# Description

Once selected models have been fit and explored, projections to single or multiple scenarios can be performed. The `predict_selected()` function is designed for projections to single scenarios (i.e., a single set of new data). This vignette contains examples of how to use many of the options available for model predictions.

<br>

# Getting ready

At this point it is assumed that `kuenm2` is installed (if not, see the [Main guide](index.html)). Load `kuenm2` and any other required packages, and define a working directory (if needed).

Note: functions from other packages (i.e., not from base R or `kuenm2`) used in this guide will be displayed as `package::function()`.

```{r, results='hide', message=FALSE, warning=FALSE}
# Load packages
library(kuenm2)
library(terra)

# Current directory
getwd()

# Define new directory
#setwd("YOUR/DIRECTORY")  # uncomment and modify if setting a new directory

# Saving original plotting parameters
original_par <- par(no.readonly = TRUE)
```

<br>

# Fitted models

To predict using the selected models, a `fitted_models` object is required. For detailed information on model fitting, check the vignette [Fit and Explore Selected Models](model_exploration.html). The `fitted_models` object generated in that vignette is included as an example dataset within the package. Let's load it.

```{r import maxnet calib, warning=FALSE}
# Import fitted_model_maxnet
data("fitted_model_maxnet", package = "kuenm2")

# Print fitted models
fitted_model_maxnet
```

<br>

To compare the results, let's import a `fitted_models` object generated using the GLM algorithm:

```{r import glm calib, warning=FALSE}
# Import fitted_model_glm
data("fitted_model_glm", package = "kuenm2")

# Print fitted models
fitted_model_glm
```

<br>

# Model predictions

To predict selected models for a single scenario, you need a `fitted_models` object and the corresponding predictor variables. These predictor variables can be provided as either a `SpatRaster` or a `data.frame`. The names of the variables (or columns in the `data.frame`) must precisely match those used for model calibration or those used when running PCA (if `do_pca = TRUE` was set in the `prepare_data()` function; see [Prepare Data for Model Calibration](prepare_data.html) for more details).

<br>

## Predict to SpatRaster

Let's use the same raster variables that were used to prepare the data and calibrate the models. These are included as example data within the package:

```{r Import raster layers}
# Import raster layers
var <- rast(system.file("extdata", "Current_variables.tif", package = "kuenm2"))

# Plot raster layers
terra::plot(var)
```

<br>

Let's check which variables were used to calibrate our models. They are available in the `calibration_data` element of the object:

```{r check variables}
# Variables used to calibrate maxnet models
colnames(fitted_model_maxnet$calibration_data)

# Variables used to calibrate glms
colnames(fitted_model_glm$calibration_data)
```

<br>

The first column, "pr_bg", indicates the presence (1) and background (0) records, while the other columns represent the environmental variables. In this case, the variables are `bio_1`, `bio_7`, `bio_12`, `bio_15`, and `SoilType`. All these variables are present in the `SpatRaster` (`var`) imported, so, we can predict our models to this raster. Let's begin by predicting the maxnet model:

```{r predict maxnet}
p_maxnet <- predict_selected(models = fitted_model_maxnet, new_variables = var,
                             progress_bar = FALSE)
```

<br>

By default, the function computes consensus metrics (mean, median, range, and standard deviation) for each model across its replicates (if they were produced), as well as a general consensus across all models (if multiple were selected). In this case, the output is a `list` containing `SpatRasters` for predictions, the consensus for each model, and the general consensus:

```{r check maxnet predictions}
# See objects in the output of predict_selected
names(p_maxnet)
```

<br>

Let's plot the general consensus:

```{r plot maxnet general}
terra::plot(p_maxnet$General_consensus)
```

<br>

We can also plot the results for each replicate and the consensus for each model:

```{r plot models maxnet}
# Predictions for each replicate from model 192
terra::plot(p_maxnet$Model_192$Replicates)

# Consensus across each replicate from model 192
terra::plot(p_maxnet$Model_192$Model_consensus)
```

<br>

For comparison, let's predict the GLM:

```{r compare, fig.width = 7, fig.height = 4}
# Predict glm
p_glm <- predict_selected(models = fitted_model_glm,
                          new_variables = var,
                          progress_bar = FALSE)
# See selected models that were predicted
names(p_glm)

# Compare general consensus (mean) between maxnet and glm
par(mfrow = c(1, 2))  # Set grid to plot
terra::plot(p_maxnet$General_consensus$mean, main = "Maxnet")
terra::plot(p_glm$General_consensus, main = "GLM")

```

<br>

## Predict to data.frame

Instead of a `SpatRaster`, we can also predict the models to a `data.frame` with the variable values. As an example, let's convert the raster variables `var` to a `data.frame`:

```{r var to df}
var_df <- as.data.frame(var)
head(var_df)
```

<br>

Note that each column stores the values for each variable. Let's predict our Maxnet models to this `data.frame`:

```{r predict to df}
p_df <- predict_selected(models = fitted_model_maxnet,
                         new_variables = var_df,  # Now, a data.frame
                         progress_bar = FALSE)
```

<br>

Now, instead of `SpatRaster` objects, the function returns `data.frame` objects with the predictions:

```{r predict df results}
# Results by replicate of the model 192
head(p_df$Model_192$Replicates)

# Consensus across replicates of the model 192
head(p_df$Model_192$Model_consensus)

# General consensus across all models
head(p_df$General_consensus)
```

<br>

# Options for predictions

## Output type

Maxnet models produce four different types of output for their predictions: raw, cumulative, logistic, and cloglog. These are described in [Merow et al. 2013](https://nsojournals.onlinelibrary.wiley.com/doi/10.1111/j.1600-0587.2013.07872.x) and [Phillips et al. 2017](https://nsojournals.onlinelibrary.wiley.com/doi/10.1111/ecog.03049).

All four output types are monotonically related; thus, rank-based metrics for model fit (e.g., omission rate and partial ROC) will be identical. However, the output types have different scaling, which leads to distinct interpretations and visually different prediction maps.

-   **Raw (or exponential) output** is interpreted as a Relative Occurrence Rate (ROR). The ROR sums to 1 when predicted to the data used to train the model.
-   **Cumulative output** assigns to a location the sum of all raw values less than or equal to the raw value for that location, and then rescales this to range between 0 and 100. Cumulative output can be interpreted in terms of an omission rate because thresholding at a value of *c* to predict a suitable/unsuitable cell will omit approximately *c*% of presences.
-   **Cloglog output (Default)** transforms the raw values into a scale of relative suitability ranging between 0 and 1, using a logistic transformation based on a user-specified parameter '$\tau$', which represents the probability of presence at 'average' presence locations. In this context, the tau value defaults to $\tau \approx 0.632$.
-   **Logistic output** is similar to Cloglog, but it assumes that $\tau = 0.5$.

Let's examine the differences between these four output types for Maxnet models:

```{r output types}
p_cloglog <- predict_selected(models = fitted_model_maxnet, new_variables = var,
                              type = "cloglog", progress_bar = FALSE)
p_logistic <- predict_selected(models = fitted_model_maxnet, new_variables = var,
                              type = "logistic", progress_bar = FALSE)
p_cumulative <- predict_selected(models = fitted_model_maxnet, new_variables = var,
                                 type = "cumulative", progress_bar = FALSE)
p_raw <- predict_selected(models = fitted_model_maxnet, new_variables = var,
                          type = "raw", progress_bar = FALSE)

# Plot the differences
par(mfrow = c(2, 2))
terra::plot(p_cloglog$General_consensus$mean, main = "Cloglog (Default)",
            zlim = c(0, 1))
terra::plot(p_logistic$General_consensus$mean, main = "Logistic",
            zlim = c(0, 1))
terra::plot(p_cumulative$General_consensus$mean, main = "Cumulative",
            zlim = c(0, 1))
terra::plot(p_raw$General_consensus$mean, main = "Raw", zlim = c(0, 1))
```

<br>

## Clamping variables

By default, predictions are performed with free extrapolation (`extrapolation_type = "E"`). This can be problematic when the peak of suitability occurs at the extremes of a predictor's range. For example, let's examine the response curve of the Maxnet model for `bio_7` (Temperature Annual Range):

```{r response bio_7}
response_curve(models = fitted_model_maxnet, variable = "bio_7",
               extrapolation_factor = 1)
```

<br>

Note that higher suitability occurs at low values of the temperature range. However, the lower limit of the calibration data used to fit the models (dashed line) is at 15.7ºC. The premise that suitability will increase and stabilize at lower values of `bio_7` is an extrapolation of the model (the area to the left of the dashed line). It's possible that suitability decreases at extremely low values, but training data is insufficient for the model to predict this.

One way to address this is by clamping the variables. This means that all prediction values outside the training range (both below the lower value and above the upper value) are set to the prediction values found at the limits of the range. For example, in the calibration data for the Maxnet models, the lower and upper limits for `bio_7` are 15.7ºC and 23.3ºC, respectively:

```{r check lower and upper limits}
range(fitted_model_maxnet$calibration_data$bio_7)
```

<br>

To observe the effect of clamping this variable, let's create a hypothetical scenario where `bio_7` has very low values:

```{r create scenario, fig.width = 7, fig.height = 4}
# From bio_7, reduce values
new_bio7 <- var$bio_7 - 3

# Create new scenario
new_var <- var

# Replace bio_7 with new_bio7 in this scenario
new_var$bio_7 <- new_bio7

# Plot the differences
par(mfrow = c(1, 2))
terra::plot(var$bio_7, main = "Original bio_7", range = c(5, 25))
terra::plot(new_var$bio_7, main = "New bio_7", range = c(5, 25))
```

<br>

Let's predict the Maxnet models for this new scenario with both free extrapolation (`extrapolation_type = "E"`) and with clamped variables (`extrapolation_type = "EC"`):

```{r}
# Predict to hypothetical scenario with free extrapolation
p_free_extrapolation <- predict_selected(models = fitted_model_maxnet,
                                         new_variables = new_var,  # New scenario
                                         consensus = "mean",
                                         extrapolation_type = "E",  # Free extrapolation (Default)
                                         progress_bar = FALSE)

# Predict to hypothetical scenario with clamping
p_clamping <- predict_selected(models = fitted_model_maxnet,
                               new_variables = new_var,  # New scenario
                               consensus = "mean",
                               extrapolation_type = "EC",  # Extrapolation with clamping
                               progress_bar = FALSE)

# Get and see differences
p_difference <- p_free_extrapolation$General_consensus$mean - p_clamping$General_consensus$mean

# Plot the differences
par(mfrow = c(2, 2))
terra::plot(p_free_extrapolation$General_consensus$mean,
            main = "Free extrapolation", zlim = c(0, 1))
terra::plot(p_clamping$General_consensus$mean, main = "Clamping",
             zlim = c(0, 1))
terra::plot(p_difference, main = "Difference")
terra::plot(new_bio7, main = "Hypothetical bio_7", type = "interval")
```

<br>

Note that when we clamp the variables, regions with extremely low values of (the hypothetical) `bio_7` exhibit lower predicted suitability values compared to when free extrapolation is allowed.

By default, when `extrapolation_type = "EC"` is set, all predictor variables are clamped. You can specify which variables to clamp using the `var_to_restrict` argument.

<br>

## No extrapolation

A more rigorous approach is to *predict with no extrapolation*. Here regions outside the limits of the training data are assigned a suitability value of 0. Let's proceed to observe the differences:

```{r no extrapolation}
# Predict to hypothetical scenario with no extrapolation
p_no_extrapolation <- predict_selected(models = fitted_model_maxnet,
                                       new_variables = new_var,  # New scenario
                                       consensus = "mean",
                                       extrapolation_type = "NE",  # No extrapolation
                                       progress_bar = FALSE)
# Plot the differences
par(mfrow = c(2, 2))
terra::plot(p_free_extrapolation$General_consensus$mean, main = "Free extrapolation",
            zlim = c(0, 1))
terra::plot(p_clamping$General_consensus$mean, main = "Clamping",
            zlim = c(0, 1))
terra::plot(p_no_extrapolation$General_consensus$mean, main = "No extrapolation",
            zlim = c(0, 1))
terra::plot(new_bio7, main = "Hypothetical bio_7", type = "interval")
```

<br>

In this example, a large portion of the predicted area shows zero suitability. This is because, in this hypothetical scenario, much of the region has `bio_7` values lower than those in the training data, which has a minimum of 15ºC. Suitability values greater than zero are only in areas where `bio_7` falls within the training range.

By default, when `extrapolation_type = "NE"` is set, all predictor variables are considered for this process. You can specify a subset of variables to be considered for extrapolation using the `var_to_restrict` argument.

<br>

# Binarize predictions

The `fitted_models` object stores the thresholds that can be used to classify model predictions into suitable and unsuitable areas. These thresholds correspond to the omission error rate used during model selection (e.g., 5% or 10%).

You can access the omission error rate used to calculate the thresholds directly from the object:

```{r omission error}
# Get omission error used to select models and calculate the thesholds
## For maxnet model
fitted_model_maxnet$omission_rate

## For glm
fitted_model_glm$omission_rate
```

<br>

In both models, a 10% omission error rate was used to calculate the thresholds. This means that when predictions are binarized, approximately 10% of the presence records used to train models will fall into areas classified as unsuitable.

The thresholds are summarized in two ways: the mean and median across replicates for each model, and the consensus mean and median across all selected models (when more than one model is selected). Let's check the thresholds for the general consensus:

```{r thresholds}
# For maxnet
fitted_model_maxnet$thresholds$consensus

# For glm
fitted_model_glm$thresholds$consensus
```

<br>

Let's use these threshold values to binarize models predictions:

```{r binarize models, fig.width = 7, fig.height = 4}
# Get the threshold values for models (general consensus)
thr_mean_maxnet <- fitted_model_maxnet$thresholds$consensus$mean  # Maxnet
thr_mean_glm <- fitted_model_glm$thresholds$consensus$mean  # glm

# Binarize models
mean_maxnet_bin <- (p_maxnet$General_consensus$mean >= thr_mean_maxnet) * 1
mean_glm_bin <- (p_glm$General_consensus >= thr_mean_glm) * 1

# Compare results
par(mfrow = c(1, 2))  # Set grid to plot
terra::plot(mean_maxnet_bin, main = "Maxnet")
terra::plot(mean_glm_bin, main = "GLM")
```

<br>

```{r par_reset}
# Reset plotting parameters
par(original_par)
```

<br>

# Saving predictions

We can save the predictions to the disk by setting `write_files = TRUE`. When this option is enabled, you must provide a directory path in the `out_dir` argument.

If `new_variables` is a `SpatRaster`, the function will save files as GeoTIFF (.tif) files. If `new_variables` is a `data.frame`, the function will save the output files as Comma Separated Value (.csv) files.

```{r save, eval=FALSE}
p_save <- predict_selected(models = fitted_model_maxnet,
                           new_variables = var,
                           write_files = TRUE,  # To save to the disk
                           write_replicates = TRUE,  # To save predictions for each replicate
                           out_dir = tempdir(),  # Directory to save the results (temporary directory)
                           progress_bar = FALSE)
```

<br>

Alternatively, we can use `writeRaster()` to save specific output predictions manually. For example, to save only the mean layer from the general consensus results:

```{r writeRaster, eval = FALSE}
writeRaster(p_maxnet$General_consensus$mean,
            filename = file.path(tempdir(), "Mean_consensus.tif"))
```

# Detecting changes between single scenarios

To compare predictions from two single scenarios representing different time periods (e.g., present vs. future or present vs. past), the function `prediction_changes()` can be used to identify areas of **loss (contraction)**, **gain (expansion)**, and **stability (no change)** in suitable conditions.

As an example, we will project the fitted model to a **single GCM** representing future climatic conditions:

```{r import var future}
future_var <- terra::rast(system.file("extdata",
                                      "wc2.1_10m_bioc_ACCESS-CM2_ssp585_2081-2100.tif",
                                      package = "kuenm2"))
plot(future_var)
```

Next, we need to rename the variables so that they match the variable names used when fitting the models:

```{r}
names(future_var) <- sub("bio0", "bio", names(future_var))
names(future_var) <- sub("bio", "bio_", names(future_var))
names(var)
names(future_var)
```

We also need to append the static soil variable to the set of future predictors:

```{r}
future_var <- c(future_var, var$SoilType)
```

Now we can generate predictions under future environmental conditions:

```{r}
# Predict
p_future <- predict_selected(models = fitted_model_maxnet,
                             new_variables = future_var,
                             progress_bar = FALSE)

# Plot consensus (mean)
plot(c(p_maxnet$General_consensus$mean,
       p_future$General_consensus$mean),
     main = c("Present", "Future (SSP 585, 2081-2100)"))

```

To identify how suitable areas change between scenarios, we can use `prediction_changes()`. This function binarizes the predictions using the threshold stored in the fitted models and then classifies each cell as gain, loss, or stable suitability.

```{r}
# Compute changes between scenarios
p_changes <- prediction_changes(current_predictions = p_maxnet$General_consensus$mean,
                                new_predictions = p_future$General_consensus$mean,
                                fitted_models = fitted_model_maxnet,
                                predicted_to = "future")
# Plot result
terra::plot(p_changes)
```

In this example, we are comparing current predictions with future predictions, so the argument `predicted_to = "future"` is used. If the comparison were with past predictions, this argument should be set accordingly to ensure that gains and losses are classified correctly.

The `prediction_changes()` function is designed to compute changes between single scenarios, meaning that the new scenario is represented by one GCM. If projections include multiple GCMs, the function `projection_changes()` should be used instead.

For more details on projecting models to multiple scenarios, see the vignette [6. Project Models to Multiple Scenarios](model_projections.html).