Skip to content

fct_lump_n warning: one observation per factor level #370

Open
@MagnusNordmo

Description

@MagnusNordmo

fct_lump_n() relies on frequency information to identify the top n levels, but with only one observation per factor level, it may not produce meaningful results. This can occur in various contexts and often reflects a misspecified request. A warning could help users recognize when their data structure is unsuitable for fct_lump_n().

Suggested warning: "warning: only one observation per factor level".

library(tidyverse)
library(reprex)

dat <- tibble(x = sample(c("X","Y","Z","D"),size = 20,replace = T),
       y = sample(1:10,replace = T,size = 20)) 

dat |> 
  mutate(x = fct_lump_n(x,n = 2)) # works
#> # A tibble: 20 × 2
#>    x         y
#>    <fct> <int>
#>  1 Y         8
#>  2 D        10
#>  3 Other     5
#>  4 D         8
#>  5 D         3
#>  6 D         2
#>  7 D         6
#>  8 Y         1
#>  9 Y         5
#> 10 D         8
#> 11 X         6
#> 12 Y         5
#> 13 D         3
#> 14 X         2
#> 15 Y         2
#> 16 D        10
#> 17 D         1
#> 18 X         9
#> 19 X         5
#> 20 X         2
dat |> 
  summarise(mean = mean(y),.by = x) |> 
  mutate(x = fct_lump_n(x,n=2)) # does not work with no warning
#> # A tibble: 4 × 2
#>   x      mean
#>   <fct> <dbl>
#> 1 Y      4.2 
#> 2 D      5.67
#> 3 Z      5   
#> 4 X      4.8
# warning: only one observation per factor level

Created on 2024-12-06 with reprex v2.1.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions