Open
Description
I'd like to be able to override the fct_lump_*()
functions with a keep
argument, to avoid lumping particular levels into "Other".
For example, using fct_lump_prop()
, I want to lump factors that occur <20% of the time, but I want to keep "A" no matter what.
xxx <- factor(c("A", "B", "B", "C", "C", "C", "C", "C", "C", "C"))
# Desired:
fct_lump_prop(xxx, 0.2, keep = "A")
#> [1] A Other Other C C C C C C C
#> Levels: A C Other
Adding the second part of the condition (levels(f) %in% keep
) to the lvls_other()
call as below would do it, I think:
fct_lump_prop <- function(f, prop, w = NULL, other_level = "Other",
keep = NULL) {
f <- check_factor(f)
check_number_decimal(prop)
check_string(other_level, allow_na = TRUE)
level_w <- compute_weights(f, w)
# Compute proportion of total, including NAs
if (is.null(w)) {
prop_n <- level_w / length(f)
} else {
prop_n <- level_w / sum(w)
}
if (prop < 0) {
lvls_other(f, prop_n <= -prop | levels(f) %in% keep, other_level)
} else {
lvls_other(f, prop_n > prop | levels(f) %in% keep, other_level)
}
}
Metadata
Metadata
Assignees
Labels
No labels