Open
Description
While doing some investigation in to the performance of my code I found that the order of filter
and st_as_sf
makes an order of magnitude difference in the performance of code. Its not a bug in the sense that something does not work but it seems that this is maybe unnecessarily slow therefore I thought I would report any way. Most of the time seems to be spend in the function st_sfc
on a vapply
call. In this example case the solution to change the order is easy but that might not always be the case I'm sure not all users are aware of the dramatic difference.
require(sf)
#> Loading required package: sf
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1; sf_use_s2() is TRUE
suppressPackageStartupMessages(require(dplyr))
n <- 100000
d <- data.frame(rr = factor(sample(size = n, c(NA, "a", "b"), replace = T, prob = c(.05, .45, .5))), xx = runif(n), yy = runif(n))
data <- d
b<-bench::mark(min_iterations = 5,
data |> filter(!is.na(rr)) |> st_as_sf(
coords = c("xx", "yy"),
crs = st_crs(4326L), na.fail = FALSE
),
data |> st_as_sf(
coords = c("xx", "yy"),
crs = st_crs(4326L), na.fail = FALSE
) |> filter(!is.na(rr))
)
#> Warning: Some expressions had a GC in every iteration; so filtering is disabled.
b%>% select(median,mem_alloc,expression)
#> # A tibble: 2 × 3
#> median mem_alloc
#> <bch:tm> <bch:byt>
#> 1 64.84ms 11.9MB
#> 2 1.68s 25.4MB
#> # … with 1 more variable: expression <bch:expr>
plot(b)
#> Loading required namespace: tidyr
profvis::profvis({
data |> st_as_sf(
coords = c("xx", "yy"),
crs = st_crs(4326L), na.fail = FALSE
) |> filter(!is.na(rr))
})
sessionInfo()
#> R version 4.1.2 (2021-11-01)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 20.04.3 LTS
#>
#> Matrix products: default
#> BLAS: /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.9.0
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.9.0
#>
#> locale:
#> [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
#> [3] LC_TIME=nl_NL.UTF-8 LC_COLLATE=en_US.UTF-8
#> [5] LC_MONETARY=nl_NL.UTF-8 LC_MESSAGES=en_US.UTF-8
#> [7] LC_PAPER=nl_NL.UTF-8 LC_NAME=C
#> [9] LC_ADDRESS=C LC_TELEPHONE=C
#> [11] LC_MEASUREMENT=nl_NL.UTF-8 LC_IDENTIFICATION=C
#>
#> attached base packages:
#> [1] stats graphics grDevices utils datasets methods base
#>
#> other attached packages:
#> [1] dplyr_1.0.7 sf_1.0-5
#>
#> loaded via a namespace (and not attached):
#> [1] Rcpp_1.0.8 tidyr_1.1.4 ps_1.6.0 class_7.3-19
#> [5] assertthat_0.2.1 digest_0.6.29 utf8_1.2.2 R6_2.5.1
#> [9] backports_1.4.0 reprex_2.0.1 evaluate_0.14 e1071_1.7-9
#> [13] ggplot2_3.3.5 highr_0.9 pillar_1.6.4 rlang_0.4.12
#> [17] rstudioapi_0.13 callr_3.7.0 R.utils_2.11.0 R.oo_1.24.0
#> [21] rmarkdown_2.11 styler_1.6.2 webshot_0.5.2 stringr_1.4.0
#> [25] htmlwidgets_1.5.4 munsell_0.5.0 proxy_0.4-26 compiler_4.1.2
#> [29] vipor_0.4.5 xfun_0.28 pkgconfig_2.0.3 ggbeeswarm_0.6.0
#> [33] htmltools_0.5.2 tidyselect_1.1.1 tibble_3.1.6 fansi_1.0.2
#> [37] crayon_1.4.2 withr_2.4.3 R.methodsS3_1.8.1 grid_4.1.2
#> [41] jsonlite_1.7.2 gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2
#> [45] magrittr_2.0.1 units_0.8-0 scales_1.1.1 KernSmooth_2.23-20
#> [49] bench_1.1.2 cli_3.1.0.9000 stringi_1.7.6 profmem_0.6.0
#> [53] farver_2.1.0 fs_1.5.1 ellipsis_0.3.2 generics_0.1.1
#> [57] vctrs_0.3.8 tools_4.1.2 R.cache_0.15.0 glue_1.6.0
#> [61] beeswarm_0.4.0 purrr_0.3.4 processx_3.5.2 fastmap_1.1.0
#> [65] yaml_2.2.1 colorspace_2.0-2 classInt_0.4-3 knitr_1.36
#> [69] profvis_0.3.7
Created on 2022-01-20 by the reprex package (v2.0.1)
Metadata
Metadata
Assignees
Labels
No labels