-
Notifications
You must be signed in to change notification settings - Fork 134
Closed
Description
I'd like to be able to override the fct_lump_*() functions with a keep argument, to avoid lumping particular levels into "Other".
For example, using fct_lump_prop(), I want to lump factors that occur <20% of the time, but I want to keep "A" no matter what.
xxx <- factor(c("A", "B", "B", "C", "C", "C", "C", "C", "C", "C"))
# Desired:
fct_lump_prop(xxx, 0.2, keep = "A")
#> [1] A Other Other C C C C C C C
#> Levels: A C OtherAdding the second part of the condition (levels(f) %in% keep) to the lvls_other() call as below would do it, I think:
fct_lump_prop <- function(f, prop, w = NULL, other_level = "Other",
keep = NULL) {
f <- check_factor(f)
check_number_decimal(prop)
check_string(other_level, allow_na = TRUE)
level_w <- compute_weights(f, w)
# Compute proportion of total, including NAs
if (is.null(w)) {
prop_n <- level_w / length(f)
} else {
prop_n <- level_w / sum(w)
}
if (prop < 0) {
lvls_other(f, prop_n <= -prop | levels(f) %in% keep, other_level)
} else {
lvls_other(f, prop_n > prop | levels(f) %in% keep, other_level)
}
}Metadata
Metadata
Assignees
Labels
No labels