Skip to content

Propagate special missing values through dplyr operations etc. #349

Description

@SchmidtPaul

Current Behavior

When using naniar's special missing values (via bind_shadow() and recode_shadow()), the missing value types are not propagated through dplyr operations like mutate(). This means we lose information about why a result is NA when performing calculations.

I am aware that this is probably out of scope of what naniar can do, but do you happen to have a suggested approach here?

Example

Here's a minimal reproducible example where the result is completely uninformed about why it is NA.

library(naniar)
library(tidyverse)

tbl <- tibble(val1 = c('12', '*', 'x'), val2 = c('1', '2', 'x'))
cols <- c('val1', 'val2')

tbl_shadow <- tbl %>%  
  bind_shadow() %>% 
  recode_shadow(val1 = .where(val1 == '*' ~ "*", val1 == 'x' ~ 'x')) %>% 
  recode_shadow(val2 = .where(val2 == '*' ~ "*", val2 == 'x' ~ 'x')) %>% 
  replace_with_na(replace = setNames(lapply(cols, function(x) c('*', 'x')), cols)) %>%
  mutate(across(all_of(cols), as.numeric))

tbl_shadow %>%
  mutate(result = val1 / val2) 
#> # A tibble: 3 × 5
#>    val1  val2 val1_NA val2_NA result
#>   <dbl> <dbl> <fct>   <fct>    <dbl>
#> 1    12     1 !NA     !NA         12
#> 2    NA     2 NA_*    !NA         NA
#> 3    NA    NA NA_x    NA_x        NA

Created on 2024-12-16 with reprex v2.1.1

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions