case_when() Lacks Safe Handling for Unexpected Values

Currently, `case_when()` does not provide a built-in way to validate categorical inputs and throw an error when an unexpected value is encountered. The function requires all return values to have the same type, making it impossible to safely use in cases where an unexpected value is encountered. The function is also incompatible in most cases with `stop()`.

This makes `case_when()` unsafe in cases where developers need both:
1. A normal transformation for known values
2. A hard error for unknown values

## Reproducible Example:
```
library(dplyr)


replace_func <- function(x) {
  
  case_when(
    x == "A" ~ 1,
    x == "B" ~ 2,
    x == "C" ~ 3,
    
    # If there is a different value I want the function to throw an error
    # and stop the execution
    .default = stop(paste0("Invalid value", x))
  )
  
data <- tibble(x = c("A", "B", "A", "C"))

# This will throw an error - even though all values are specified in the function
data %>% mutate(new_x = replace_func(x))
# Expected behavior would be to return something like:

# A tibble: 4 x 2
#   x     new_x
#   <chr> <dbl>
# 1 A         1
# 2 B         2
# 3 A         1
# 4 C         3


# But for it to fail if there is a value not specified in the function
data1 <- tibble(x = c("A", "B", "A", "C", "D"))


# This should throw an error because the default value is stop() and the value
# "D" is not specified in the function
data1 %>% mutate(new_x = replace_func(x))
```

Currently, the only alternatives for handling unknown values in `case_when()` are:

1. A manual check after executing `case_when()`, which is an imperfect solution with unnecessary complexity or
2. Leaving `.default = NA`, which can lead to silent failures—an unknown value that should have been handled explicitly might be mistakenly transformed into `NA` instead of triggering an error.

Neither of these solutions is ideal.

## Proposed Solution

I believe the default behavior should be something along the lines of `.default = stop(paste0("Unknown value: ", x))`. This would force users to explicitly handle unknown values within their program, ensuring safer data transformations. If users want to allow unknown values to default to `NA`, they should be required to specify it explicitly by using `.default = NA` or `TRUE ~ NA`. This approach would provide better safety by default, preventing unintended `NA` values from propagating due to missing mappings in `case_when()`.

Would love to hear your thoughts on this!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

case_when() Lacks Safe Handling for Unexpected Values #7653

Reproducible Example:

Proposed Solution

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

case_when() Lacks Safe Handling for Unexpected Values #7653

Description

Reproducible Example:

Proposed Solution

Activity

philibe commented on Feb 7, 2025

philibe commented on Feb 10, 2025

ja-ortiz-uniandes commented on Feb 10, 2025

philibe commented on Feb 13, 2025

RaymondBalise commented on Feb 14, 2025

ja-ortiz-uniandes commented on Feb 19, 2025

philibe commented on Feb 21, 2025

philibe commented on Feb 21, 2025

RaymondBalise commented on Feb 21, 2025

Alejandro-Ortiz-WBG commented on Feb 21, 2025

RaymondBalise commented on Feb 21, 2025

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions