Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
35 commits
Select commit Hold shift + click to select a range
6198c9b
introduce new plotting method for mahalanobis, plot_mahalanobis
rempsyc Jul 31, 2025
345b779
gemini/copilot suggestions, spellcheck, lints, version
rempsyc Aug 1, 2025
c846b19
replace @examplesIf insight::check_if_installed with @examplesIf requ…
rempsyc Aug 1, 2025
2836d8e
improve elbow logic
rempsyc Aug 4, 2025
4438d98
lints, styler
rempsyc Aug 4, 2025
7879b92
add support of scree plot type for all outlier methods
rempsyc Aug 7, 2025
f17966e
styler, examplesIf (skipping lint checks)
rempsyc Aug 7, 2025
c2d5f3a
document
rempsyc Aug 7, 2025
c894333
Merge branch 'main' into plot_outlieres
rempsyc Aug 14, 2025
9023f69
update remotes
strengejacke Aug 30, 2025
c8d6067
styler
strengejacke Aug 30, 2025
80eb1dd
styler
strengejacke Aug 30, 2025
b30f630
styler
strengejacke Aug 30, 2025
7129f92
document "count" option
strengejacke Aug 30, 2025
bb09f98
style
strengejacke Aug 30, 2025
78e4d4a
add tests
strengejacke Aug 30, 2025
0c855c8
test
strengejacke Aug 30, 2025
1afaaac
add snaps
strengejacke Aug 30, 2025
1357e02
news
strengejacke Aug 30, 2025
004ec5d
fix tests
strengejacke Aug 30, 2025
936a49c
fix tests
strengejacke Aug 30, 2025
31b5521
styler
strengejacke Aug 30, 2025
e21e6d6
Merge branch 'main' into plot_outlieres
strengejacke Aug 30, 2025
e3f7687
Merge branch 'main' into plot_outlieres
strengejacke Aug 30, 2025
0d9a525
Merge branch 'main' into plot_outlieres
strengejacke Aug 30, 2025
687c1e9
styler
strengejacke Aug 30, 2025
82f1346
fix
strengejacke Aug 30, 2025
f6fff4e
fix
strengejacke Aug 30, 2025
502babb
skip test
strengejacke Aug 30, 2025
5cea4c3
update tests
strengejacke Aug 31, 2025
aa7faf0
use namesoace
strengejacke Aug 31, 2025
442151d
use namespace
strengejacke Aug 31, 2025
779aab1
address comments
strengejacke Aug 31, 2025
24ab477
update snap
strengejacke Aug 31, 2025
03c0464
update tests
strengejacke Aug 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 1 addition & 2 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Type: Package
Package: see
Title: Model Visualisation Toolbox for 'easystats' and 'ggplot2'
Version: 0.11.0.7
Version: 0.11.0.8
Authors@R:
c(person(given = "Daniel",
family = "Lüdecke",
Expand Down Expand Up @@ -122,4 +122,3 @@ Config/testthat/edition: 3
Config/testthat/parallel: true
Config/Needs/website: easystats/easystatstemplate
Config/rcmdcheck/ignore-inconsequential-notes: true
Remotes: easystats/insight, easystats/parameters, easystats/performance
3 changes: 3 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,9 @@
* `plot()` for `performance::check_normality()` now also supports objects from
`psych::fa()`, `psych::principal()` and `parameters::factor_analysis()`.

* `plot()` for `performance::check_outliers()` gets a new `"scree"` type option,
to create a scree plot of outlier statistics.

* Minor re-labelling of axis titles and subtitles in `plot()` for
`performance::check_residuals()`.

Expand Down
2 changes: 1 addition & 1 deletion R/plot.check_model.R
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@
#'
#' @return A ggplot2-object.
#'
#' @seealso See also the vignette about [`check_model()`](https://easystats.github.io/performance/articles/check_model.html).

Check warning on line 12 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/plot.check_model.R,line=12,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 125 characters.

Check warning on line 12 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/plot.check_model.R,line=12,col=121,[line_length_linter] Lines should not be more than 120 characters. This line is 125 characters.
#'
#' @examplesIf require("patchwork")
#' library(performance)
Expand All @@ -18,7 +18,7 @@
#' plot(check_model(model))
#'
#' @export
plot.see_check_model <- function(

Check warning on line 21 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/plot.check_model.R,line=21,col=1,[cyclocomp_linter] Reduce the cyclomatic complexity of this expression from 82 to at most 40.

Check warning on line 21 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/plot.check_model.R,line=21,col=1,[cyclocomp_linter] Reduce the cyclomatic complexity of this expression from 82 to at most 40.
x,
style = theme_lucid,
colors = NULL,
Expand Down Expand Up @@ -210,7 +210,7 @@
!is.null(x$INFLUENTIAL) &&
any(c("outliers", "influential", "all") %in% check)
) {
p$OUTLIERS <- .plot_diag_outliers_new(
p$OUTLIERS <- .plot_diag_outliers_dots(
x$INFLUENTIAL,
show_labels = show_labels,
size_text = size_text,
Expand Down Expand Up @@ -317,9 +317,9 @@
if (panel) {
pw <- plots(p, n_columns = n_columns)
.safe_print_plots(pw, ...)
return(invisible(pw))

Check warning on line 320 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint / lint

file=R/plot.check_model.R,line=320,col=5,[return_linter] Use implicit return behavior; explicit return() is not needed.

Check warning on line 320 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/plot.check_model.R,line=320,col=5,[return_linter] Use implicit return behavior; explicit return() is not needed.
} else {
return(p)

Check warning on line 322 in R/plot.check_model.R

View workflow job for this annotation

GitHub Actions / lint-changed-files / lint-changed-files

file=R/plot.check_model.R,line=322,col=5,[return_linter] Use implicit return behavior; explicit return() is not needed.
}
}

Expand Down
185 changes: 89 additions & 96 deletions R/plot.check_outliers.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,31 +4,61 @@
#' function.
#'
#' @param type Character vector, indicating the type of plot.
#' Options are `"dots"` (default) for a scatterplot of leverage (hat) values
#' versus residuals, with Cook's Distance contours for evaluating influential
#' points, or `"bars"` for a bar chart of (rescaled) outlier statistic values
#' for each data point. Only used for outlier plots of fitted models; for
#' outlier plots of raw data values, `type = "bars"` is always used.
#' Options are:
#' - `"dots"` (default) for a scatterplot of leverage (hat) values versus
#' residuals, with Cook's Distance contours for evaluating influential points.
#' - `"scree"` for a scree-style plot highlighting "elbow outliers" (based on
#' sudden increases in distance; see details).
#' - `"bars"` for a bar chart of (rescaled) outlier statistic values for each
#' data point.
#' - `"count"` for a "histogram"-style plot of outlier, where bins represent
#' the outliers' distance values.
#'
#' `type = "dots"` is only used for outlier plots of fitted models; for
#' outlier plots of raw data values, `type` must be either `"scree"` or `"bars"`.
#' @param show_labels Logical. If `TRUE`, text labels are displayed.
#' @param size_text Numeric value specifying size of text labels.
#' @param rescale_distance Logical. If `TRUE`, distance values are rescaled
#' to a range from 0 to 1. This is mainly due to better catch the differences
#' between distance values.
#' @param elbow_threshold Optional scalar specifying the minimum jump in
#' distance (between adjacent sorted observations) used to detect the elbow point.
#' If supplied, all observations following the first jump greater than this value
#' are flagged as outliers. If `NULL` (default), the largest jump is used
#' automatically. Higher values yield more conservative outlier detection.
#' @param verbose Logical. If `TRUE` (default), prints a summary list of outliers.
#' @inheritParams data_plot
#' @inheritParams plot.see_check_normality
#'
#' @return A ggplot2-object.
#' @details When using `type = "scree"`, the function will provide a
#' scree-style distance plot that highlights two types of outliers.
#' Observations exceeding the specified threshold are shown in warm colors,
#' while observations following the largest jump ("elbow", or the specified
#' cut-off value) in the sorted distances are shown in cool colors. Elbow
#' outliers are defined based on sudden increases in distance, analogous to
#' inflection points in scree plots.
#'
#' @return A ggplot2-object.
#' @references
#' The scree plot implementation was inspired by a visualization approach
#' developed by Prof. Marina Doucerain (Université du Québec à Montréal).
#' @examples
#' library(performance)
#' data(mtcars)
#' mt1 <- mtcars[, c(1, 3, 4)]
#' mt1$ID <- row.names(mt1)
#' mt2 <- rbind(
#' mt1,
#' data.frame(mpg = c(37, 40), disp = c(300, 400), hp = c(110, 120))
#' data.frame(
#' mpg = c(37, 48), disp = c(300, 400), hp = c(110, 120),
#' ID = c("JZ", "GP")
#' )
#' )
#' model <- lm(disp ~ mpg + hp, data = mt2)
#' plot(check_outliers(model))
#' plot(check_outliers(mt2$mpg, method = "zscore"), type = "bar")
#' @examplesIf require("ggrepel")
#' plot(check_outliers(mt2[-3], method = "mahalanobis", ID = "ID"))
#' @export
plot.see_check_outliers <- function(
x,
Expand All @@ -39,21 +69,33 @@ plot.see_check_outliers <- function(
base_size = 10,
alpha_dot = 0.8,
colors = c("#3aaf85", "#1b6ca8", "#cd201f"),
rescale_distance = TRUE,
type = c("dots", "bars"),
rescale_distance = FALSE,
type = "dots",
elbow_threshold = NULL,
show_labels = TRUE,
verbose = TRUE,
...
) {
type <- match.arg(type)
type <- insight::validate_argument(type, c("dots", "scree", "count", "bars"))
influential_obs <- attributes(x)$influential_obs
outlier_methods <- attr(x, "methods", exact = TRUE)
outlier_methods <- attr(x, "method", exact = TRUE)

if (length(outlier_methods) == 0) {
insight::format_error(
"Invalid outlier method detected. Please ensure `check_outliers()` was called with valid parameters."
)
} else if (
length(outlier_methods) == 2 && all(outlier_methods == c("optics", "optics_xi"))
) {
outlier_methods <- outlier_methods[[1]]
}

if (
type == "dots" &&
!is.null(influential_obs) &&
(is.null(outlier_methods) || length(outlier_methods) == 1)
) {
.plot_diag_outliers_new(
.plot_diag_outliers_dots(
influential_obs,
show_labels = show_labels,
size_text = size_text,
Expand All @@ -64,15 +106,23 @@ plot.see_check_outliers <- function(
alpha_dot = alpha_dot,
colors = colors
)
} else if (length(outlier_methods) == 1) {
.plot_diag_outliers(
} else if (type == "scree" && length(outlier_methods) == 1) {
.plot_scree(
x,
rescale_distance = rescale_distance,
elbow_threshold = elbow_threshold,
verbose = verbose,
...
)
} else if (type == "count" && length(outlier_methods) == 1) {
.plot_diag_outliers_dots_count(
x,
show_labels = show_labels,
size_text = size_text,
rescale_distance = rescale_distance
)
} else {
.plot_outliers_multimethod(x, rescale_distance)
.plot_outliers_multimethod(x, rescale_distance = rescale_distance)
}
}

Expand All @@ -84,13 +134,22 @@ data_plot.check_outliers <- function(
rescale_distance = TRUE,
...
) {
data <- attributes(x)$data
att <- attributes(x)
data <- att$data
row.names(data) <- data$Obs

# Extract distances
d <- data[grepl("Distance_", names(data), fixed = TRUE)]
if (rescale_distance) {
# Also normalize the threshold using the original values
threshold <- att$threshold[[1]]
mdist <- d[1]
rescale_threshold <- (threshold - min(mdist, na.rm = TRUE)) /
(max(mdist, na.rm = TRUE) - min(mdist, na.rm = TRUE))
attr(d, "rescale_threshold") <- rescale_threshold
d <- datawizard::normalize(d, verbose = FALSE)
} else {
rescale_threshold <- NULL
}

d_long <- stats::reshape(
Expand All @@ -106,79 +165,12 @@ data_plot.check_outliers <- function(
row.names(d_long) <- d_long$id <- NULL
d_long$Method <- gsub("Distance_", "", d_long$Method, fixed = TRUE)
attr(d_long, "rescale_distance") <- isTRUE(rescale_distance)
attr(d_long, "rescale_threshold") <- rescale_threshold
d_long$ID <- data$ID
d_long
}


.plot_diag_outliers <- function(
x,
show_labels = TRUE,
size_text = 3.5,
rescale_distance = TRUE
) {
d <- data_plot(x, rescale_distance = rescale_distance)
d$Id <- seq_len(nrow(d))
d$Outliers <- as.factor(attr(x, "data", exact = TRUE)[["Outlier"]])
d$Id[d$Outliers == "0"] <- NA

method <- switch(
attr(x, "method", exact = TRUE),
cook = "Cook's Distance",
pareto = "Pareto",
mahalanobis = "Mahalanobis Distance",
ics = "Invariant Coordinate Selection",
mcd = "Minimum Covariance Determinant",
optics = "OPTICS",
iforest = "Isolation Forest",
"Cook's Distance"
)

threshold <- attr(x, "threshold", exact = TRUE)[[method]]
rescaled <- attr(d, "rescale_distance")
if (isTRUE(rescaled)) {
x_lab <- paste0(method, " (rescaled range 0-1)")
} else {
x_lab <- method
}

size_text <- size_text %||% 3.5

p <- ggplot(
d,
aes(x = .data$Distance, fill = .data$Outliers, label = .data$Id)
) +
geom_histogram() +
labs(
title = "Influential Observations",
subtitle = "High Cook's distance might reflect potential outliers",
x = x_lab,
y = "Count",
fill = NULL
) +
scale_fill_manual(values = c("#2c3e50", "#c0392b")) +
guides(fill = "none", color = "none", label = "none")

if (!is.null(threshold) && !is.na(threshold)) {
p <- p +
geom_vline(
xintercept = threshold,
linetype = "dashed",
color = "#c0392b"
)
}

if (isTRUE(show_labels)) {
if (requireNamespace("ggrepel", quietly = TRUE)) {
p <- p + ggrepel::geom_text_repel(y = 2.5, size = size_text, na.rm = TRUE)
} else {
p <- p + geom_text(y = 2.5, size = size_text, na.rm = TRUE)
}
}

p + guides(x = guide_axis(n.dodge = 2))
}


.plot_outliers_multimethod <- function(x, rescale_distance = TRUE) {
d <- data_plot(x, rescale_distance = rescale_distance)

Expand All @@ -190,31 +182,32 @@ data_plot.check_outliers <- function(
}

suppressWarnings(
ggplot(
ggplot2::ggplot(
data = d,
aes(
ggplot2::aes(
x = .data$Obs,
y = .data$Distance,
fill = .data$Method,
group = .data$Method
)
) +
# geom_vline(xintercept = as.character(c(1, 2))) +
geom_bar(position = "dodge", stat = "identity") +
scale_fill_viridis_d() +
ggplot2::geom_bar(position = "dodge", stat = "identity") +
ggplot2::scale_fill_viridis_d() +
theme_modern() +
labs(x = "Observation", y = y_lab, fill = "Method") +
# Warning: Vectorized input to `element_text()` is not officially supported.
# Results may be unexpected or may change in future versions of ggplot2.
theme(
axis.text.x = element_text(
ggplot2::labs(x = "Observation", y = y_lab, fill = "Method") +
## FIXME: Warning: Vectorized input to `element_text()` is not officially
## supported. Results may be unexpected or may change in future versions
## of ggplot2.
ggplot2::theme(
axis.text.x = ggplot2::element_text(
colour = ifelse(as.numeric(x) >= 0.5, "red", "darkgrey")
),
panel.grid.major.x = element_line(
panel.grid.major.x = ggplot2::element_line(
linetype = "dashed",
colour = ifelse(as.numeric(x) >= 0.5, "red", "lightgrey")
)
) +
guides(x = guide_axis(n.dodge = 2))
ggplot2::guides(x = ggplot2::guide_axis(n.dodge = 2))
)
}
Loading
Loading