Open
Description
The fct_lump_
functions are all about deciding which levels to keep or lump together based on their frequency, so I think that they should also have an option (e.g. sort = c("no", "asc", "desc")
) to return those levels in an order that is based on their frequency.
Edit: Or perhaps, add an example to the man page for these functions showing that you can use fct_infreq()
and its friends to do the ordering. However, doing this requires a fct_relevel()
at the end to ensure "Other" is the last value.
Thank you for a great package! <3
library(forcats)
set.seed(12345)
repeated_states <- rep.int(x = state.name, times = runif(n = length(state.name), min = 1, max = 300))
sort(table(repeated_states), decreasing = TRUE)
#> repeated_states
#> Georgia Minnesota Maryland Texas Pennsylvania
#> 296 289 285 278 271
#> Arkansas Alaska Oregon New Mexico South Dakota
#> 265 262 260 238 234
#> Utah Arizona Illinois Florida Alabama
#> 232 228 220 218 216
#> Mississippi Nebraska North Dakota Missouri Wyoming
#> 212 209 204 193 188
#> Rhode Island Nevada Delaware New Jersey Kansas
#> 185 163 153 145 139
#> California Massachusetts Tennessee Louisiana Iowa
#> 137 136 129 121 117
#> Kentucky Montana Ohio Oklahoma Connecticut
#> 117 117 111 109 98
#> Michigan Virginia Vermont New Hampshire North Carolina
#> 98 97 78 68 57
#> Maine Colorado Idaho South Carolina Washington
#> 54 50 46 41 18
#> Wisconsin West Virginia Hawaii New York Indiana
#> 17 13 11 2 1
as_fct <- fct_lump_n(repeated_states, 10)
levels(as_fct)
#> [1] "Alaska" "Arkansas" "Georgia" "Maryland" "Minnesota"
#> [6] "New Mexico" "Oregon" "Pennsylvania" "South Dakota" "Texas"
#> [11] "Other"
as_ordered_fct <- fct_lump_n(repeated_states, 10) |> fct_infreq() |> fct_relevel("Other", after = Inf)
levels(as_ordered_fct)
#> [1] "Georgia" "Minnesota" "Maryland" "Texas" "Pennsylvania"
#> [6] "Arkansas" "Alaska" "Oregon" "New Mexico" "South Dakota"
#> [11] "Other"
Created on 2024-12-17 with reprex v2.1.1
Metadata
Metadata
Assignees
Labels
No labels