Skip to content

User_na=TRUE in read_sav causes "error in if (!any(lossy)) { : missing value where TRUE/FALSE needed" #761

@umutatasever1990

Description

@umutatasever1990

When using user_na = TRUE in read_sav, the function preserves user-defined missing values as expected. However, when combining these data frames in R, it causes issues such as Error in if (!any(lossy)) { : missing value where TRUE/FALSE needed. I do not know the reason but this happens for public use files of OECD's PIAAC datasets. My guess would be that the labels too long. I see a similar issue here #427 but I do not see any resolution. See the code below.

library(haven)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
  
# Define file paths for the downloaded SPSS files
file1 <- "prgautp1.sav"  # Replace with the path if you downloaded locally
file2 <- "prgbelp1.sav"  # Replace with the path if you downloaded locally

# Download the SPSS files (if not already done)
 download.file("https://webfs.oecd.org/piaac/puf-data/SPSS/prgautp1.sav", file1, mode = "wb")
 download.file("https://webfs.oecd.org/piaac/puf-data/SPSS/prgbelp1.sav", file2, mode = "wb")

# Read the SPSS files with user_na = TRUE
df1 <- read_sav(file1, user_na = TRUE)
df2 <- read_sav(file2, user_na = TRUE)

# Check the structure of the data frames to understand the data types and NAs
#str(df1)
#str(df2)

# Attempt to combine using dplyr::bind_rows()
bind_rows(df1, df2)
#> Warning: `..1$D_Q18a_T` and `..2$D_Q18a_T` have conflicting value labels.
#> ℹ Labels for these values will be taken from `..1$D_Q18a_T`.
#> ✖ Values: 6
#> Error in if (!any(lossy)) {: missing value where TRUE/FALSE needed
# Attempt to combine using base rbind()
rbind(df1, df2)
#> Error in if (!any(lossy)) {: missing value where TRUE/FALSE needed

Created on 2024-09-19 with reprex v2.1.1

Session info
sessionInfo()
#> R version 4.4.1 (2024-06-14 ucrt)
#> Platform: x86_64-w64-mingw32/x64
#> Running under: Windows 10 x64 (build 19045)
#> 
#> Matrix products: default
#> 
#> 
#> locale:
#> [1] LC_COLLATE=English_United States.utf8 
#> [2] LC_CTYPE=English_United States.utf8   
#> [3] LC_MONETARY=English_United States.utf8
#> [4] LC_NUMERIC=C                          
#> [5] LC_TIME=English_United States.utf8    
#> 
#> time zone: Europe/Berlin
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.1.4 haven_2.5.4
#> 
#> loaded via a namespace (and not attached):
#>  [1] vctrs_0.6.5       cli_3.6.3         knitr_1.47        rlang_1.1.4      
#>  [5] xfun_0.45         forcats_1.0.0     generics_0.1.3    glue_1.7.0       
#>  [9] htmltools_0.5.8.1 hms_1.1.3         fansi_1.0.6       rmarkdown_2.27   
#> [13] evaluate_0.24.0   tibble_3.2.1      tzdb_0.4.0        fastmap_1.2.0    
#> [17] yaml_2.3.8        lifecycle_1.0.4   compiler_4.4.1    fs_1.6.4         
#> [21] pkgconfig_2.0.3   rstudioapi_0.16.0 digest_0.6.36     R6_2.5.1         
#> [25] readr_2.1.5       tidyselect_1.2.1  reprex_2.1.1      utf8_1.2.4       
#> [29] pillar_1.9.0      magrittr_2.0.3    tools_4.4.1       withr_3.0.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugan unexpected problem or unintended behavior

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions