Skip to content

filter.sf order of magnitude slower compared to filter #1889

Open
@bart1

Description

@bart1

While doing some investigation in to the performance of my code I found that the order of filter and st_as_sf makes an order of magnitude difference in the performance of code. Its not a bug in the sense that something does not work but it seems that this is maybe unnecessarily slow therefore I thought I would report any way. Most of the time seems to be spend in the function st_sfc on a vapply call. In this example case the solution to change the order is easy but that might not always be the case I'm sure not all users are aware of the dramatic difference.

require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
suppressPackageStartupMessages(require(dplyr))
n <- 100000
d <- data.frame(rr = factor(sample(size = n, c(NA, "a", "b"), replace = T, prob = c(.05, .45, .5))), xx = runif(n), yy = runif(n))
data <- d
b<-bench::mark(min_iterations = 5,
  data |> filter(!is.na(rr)) |> st_as_sf(
    coords = c("xx", "yy"),
    crs = st_crs(4326L), na.fail = FALSE
  ),
  data |> st_as_sf(
    coords = c("xx", "yy"),
    crs = st_crs(4326L), na.fail = FALSE
  ) |> filter(!is.na(rr))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
b%>% select(median,mem_alloc,expression)
#> # A tibble: 2 × 3
#>     median mem_alloc
#>   <bch:tm> <bch:byt>
#> 1  64.84ms    11.9MB
#> 2    1.68s    25.4MB
#> # … with 1 more variable: expression <bch:expr>
plot(b)
#> Loading required namespace: tidyr

profvis::profvis({
     data |> st_as_sf(
         coords = c("xx", "yy"),
         crs = st_crs(4326L), na.fail = FALSE
       ) |> filter(!is.na(rr))
   })

sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=nl_NL.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=nl_NL.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=nl_NL.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] dplyr_1.0.7 sf_1.0-5   
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.8         tidyr_1.1.4        ps_1.6.0           class_7.3-19      
#>  [5] assertthat_0.2.1   digest_0.6.29      utf8_1.2.2         R6_2.5.1          
#>  [9] backports_1.4.0    reprex_2.0.1       evaluate_0.14      e1071_1.7-9       
#> [13] ggplot2_3.3.5      highr_0.9          pillar_1.6.4       rlang_0.4.12      
#> [17] rstudioapi_0.13    callr_3.7.0        R.utils_2.11.0     R.oo_1.24.0       
#> [21] rmarkdown_2.11     styler_1.6.2       webshot_0.5.2      stringr_1.4.0     
#> [25] htmlwidgets_1.5.4  munsell_0.5.0      proxy_0.4-26       compiler_4.1.2    
#> [29] vipor_0.4.5        xfun_0.28          pkgconfig_2.0.3    ggbeeswarm_0.6.0  
#> [33] htmltools_0.5.2    tidyselect_1.1.1   tibble_3.1.6       fansi_1.0.2       
#> [37] crayon_1.4.2       withr_2.4.3        R.methodsS3_1.8.1  grid_4.1.2        
#> [41] jsonlite_1.7.2     gtable_0.3.0       lifecycle_1.0.1    DBI_1.1.2         
#> [45] magrittr_2.0.1     units_0.8-0        scales_1.1.1       KernSmooth_2.23-20
#> [49] bench_1.1.2        cli_3.1.0.9000     stringi_1.7.6      profmem_0.6.0     
#> [53] farver_2.1.0       fs_1.5.1           ellipsis_0.3.2     generics_0.1.1    
#> [57] vctrs_0.3.8        tools_4.1.2        R.cache_0.15.0     glue_1.6.0        
#> [61] beeswarm_0.4.0     purrr_0.3.4        processx_3.5.2     fastmap_1.1.0     
#> [65] yaml_2.2.1         colorspace_2.0-2   classInt_0.4-3     knitr_1.36        
#> [69] profvis_0.3.7

Created on 2022-01-20 by the reprex package (v2.0.1)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions