Skip to content

"keep" argument in fct_lump_*() #376

Open
@olivermagnanimous

Description

@olivermagnanimous

I'd like to be able to override the fct_lump_*() functions with a keep argument, to avoid lumping particular levels into "Other".

For example, using fct_lump_prop(), I want to lump factors that occur <20% of the time, but I want to keep "A" no matter what.

xxx <- factor(c("A", "B", "B", "C", "C", "C", "C", "C", "C", "C"))

# Desired:
fct_lump_prop(xxx, 0.2, keep = "A")
#>  [1] A     Other Other C     C     C     C     C     C     C    
#> Levels: A C Other

Adding the second part of the condition (levels(f) %in% keep) to the lvls_other() call as below would do it, I think:

fct_lump_prop <- function(f, prop, w = NULL, other_level = "Other",
                               keep = NULL) {
  f <- check_factor(f)
  check_number_decimal(prop)
  check_string(other_level, allow_na = TRUE)

  level_w <- compute_weights(f, w)
  # Compute proportion of total, including NAs
  if (is.null(w)) {
    prop_n <- level_w / length(f)
  } else {
    prop_n <- level_w / sum(w)
  }


  if (prop < 0) {
    lvls_other(f, prop_n <= -prop | levels(f) %in% keep, other_level)
  } else {
    lvls_other(f, prop_n > prop | levels(f) %in% keep, other_level)
  }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions