Skip to content

Add sort argument to fct_lump_ functions? #375

Open
@DesiQuintans

Description

@DesiQuintans

The fct_lump_ functions are all about deciding which levels to keep or lump together based on their frequency, so I think that they should also have an option (e.g. sort = c("no", "asc", "desc")) to return those levels in an order that is based on their frequency.

Edit: Or perhaps, add an example to the man page for these functions showing that you can use fct_infreq() and its friends to do the ordering. However, doing this requires a fct_relevel() at the end to ensure "Other" is the last value.

Thank you for a great package! <3

library(forcats)

set.seed(12345)

repeated_states <- rep.int(x = state.name, times = runif(n = length(state.name), min = 1, max = 300))

sort(table(repeated_states), decreasing = TRUE)
#> repeated_states
#>        Georgia      Minnesota       Maryland          Texas   Pennsylvania 
#>            296            289            285            278            271 
#>       Arkansas         Alaska         Oregon     New Mexico   South Dakota 
#>            265            262            260            238            234 
#>           Utah        Arizona       Illinois        Florida        Alabama 
#>            232            228            220            218            216 
#>    Mississippi       Nebraska   North Dakota       Missouri        Wyoming 
#>            212            209            204            193            188 
#>   Rhode Island         Nevada       Delaware     New Jersey         Kansas 
#>            185            163            153            145            139 
#>     California  Massachusetts      Tennessee      Louisiana           Iowa 
#>            137            136            129            121            117 
#>       Kentucky        Montana           Ohio       Oklahoma    Connecticut 
#>            117            117            111            109             98 
#>       Michigan       Virginia        Vermont  New Hampshire North Carolina 
#>             98             97             78             68             57 
#>          Maine       Colorado          Idaho South Carolina     Washington 
#>             54             50             46             41             18 
#>      Wisconsin  West Virginia         Hawaii       New York        Indiana 
#>             17             13             11              2              1

as_fct <- fct_lump_n(repeated_states, 10)

levels(as_fct)
#>  [1] "Alaska"       "Arkansas"     "Georgia"      "Maryland"     "Minnesota"   
#>  [6] "New Mexico"   "Oregon"       "Pennsylvania" "South Dakota" "Texas"       
#> [11] "Other"

as_ordered_fct <- fct_lump_n(repeated_states, 10) |> fct_infreq() |> fct_relevel("Other", after = Inf)

levels(as_ordered_fct)
#>  [1] "Georgia"      "Minnesota"    "Maryland"     "Texas"        "Pennsylvania"
#>  [6] "Arkansas"     "Alaska"       "Oregon"       "New Mexico"   "South Dakota"
#> [11] "Other"

Created on 2024-12-17 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions