Open
Description
fct_lump_n() relies on frequency information to identify the top n levels, but with only one observation per factor level, it may not produce meaningful results. This can occur in various contexts and often reflects a misspecified request. A warning could help users recognize when their data structure is unsuitable for fct_lump_n().
Suggested warning: "warning: only one observation per factor level".
library(tidyverse)
library(reprex)
dat <- tibble(x = sample(c("X","Y","Z","D"),size = 20,replace = T),
y = sample(1:10,replace = T,size = 20))
dat |>
mutate(x = fct_lump_n(x,n = 2)) # works
#> # A tibble: 20 × 2
#> x y
#> <fct> <int>
#> 1 Y 8
#> 2 D 10
#> 3 Other 5
#> 4 D 8
#> 5 D 3
#> 6 D 2
#> 7 D 6
#> 8 Y 1
#> 9 Y 5
#> 10 D 8
#> 11 X 6
#> 12 Y 5
#> 13 D 3
#> 14 X 2
#> 15 Y 2
#> 16 D 10
#> 17 D 1
#> 18 X 9
#> 19 X 5
#> 20 X 2
dat |>
summarise(mean = mean(y),.by = x) |>
mutate(x = fct_lump_n(x,n=2)) # does not work with no warning
#> # A tibble: 4 × 2
#> x mean
#> <fct> <dbl>
#> 1 Y 4.2
#> 2 D 5.67
#> 3 Z 5
#> 4 X 4.8
# warning: only one observation per factor level
Created on 2024-12-06 with reprex v2.1.0
Metadata
Metadata
Assignees
Labels
No labels