Skip to content

WFS paging and parallelization support #70

Open
@salvafern

Description

@salvafern

Hi @eblondel ,

I have been giving a try to ows4r to query biological occurrence data from EMODnet-Biology

In this example below, I requested:

I got a WFS request using the EMODnet-Biology download toolbox (at the end of the selection, you can copy the WFS request in "Get webservice url")

Good news are that viewParams via vendor params work like a charm! (although I have to watch out for the encoding lifewatch/eurobis#15 (comment))

I am having troubles however with the paging and parallel options. After some debugging, I think the issue might be that ows4r is relying on a param named numberMatched when using resultstype = "hits" at: https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L240

And this is not being returned geo.vliz.be (should happen around: https://github.com/eblondel/ows4R/blob/master/R/WFSFeatureType.R#L291)

Could you have a look and see what is happening?

Thanks a lot!

# Example get CPR dataset, North Sea and Calanus finmarchicus

library(ows4R)
library(parallel)

# URL as provided by download toolbox
url_download_toolbox <- "http://geo.vliz.be/geoserver/wfs/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=Dataportal%3Aeurobis-obisenv_basic&resultType=results&viewParams=where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464&propertyName=datasetid%2Cdatecollected%2Cdecimallatitude%2Cdecimallongitude%2Ccoordinateuncertaintyinmeters%2Cscientificname%2Caphiaid%2Cscientificnameaccepted&outputFormat=csv"
URLdecode(url_download_toolbox)
#> [1] "http://geo.vliz.be/geoserver/wfs/ows?service=WFS&version=1.1.0&request=GetFeature&typeName=Dataportal:eurobis-obisenv_basic&resultType=results&viewParams=where:((up.geoobjectsids+&&+ARRAY[2350]))+AND+datasetid+IN+(216);context:0100;aphiaid:104464&propertyName=datasetid,datecollected,decimallatitude,decimallongitude,coordinateuncertaintyinmeters,scientificname,aphiaid,scientificnameaccepted&outputFormat=csv"

# Only params
params <- "where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464"
URLdecode(params)
#> [1] "where:((up.geoobjectsids+&&+ARRAY[2350]))+AND+datasetid+IN+(216);context:0100;aphiaid:104464"

# Create wfs client and find feature
wfs <- WFSClient$
  new("https://geo.vliz.be/geoserver/Dataportal/wfs", "1.1.0", logger = "INFO")$
  getCapabilities()$
  findFeatureTypeByName("Dataportal:eurobis-obisenv_basic")
#> [ows4R][INFO] OWSGetCapabilities - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=1.1.0&request=GetCapabilities

# Create cluster
cl <- makeCluster(detectCores() - 1)

# Perform tests: around 20K rows
system.time(feature_only_viewparams <- wfs$getFeatures(viewParams = params, resultType="results"))
#> [ows4R][INFO] WFSDescribeFeatureType - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=1.1.0&typeName=Dataportal:eurobis-obisenv_basic&request=DescribeFeatureType 
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=1.1.0&typeName=Dataportal:eurobis-obisenv_basic&viewParams=where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464&resultType=results&request=GetFeature
#>    user  system elapsed 
#>   0.990   0.100   3.712

system.time(feature_pagination <- wfs$getFeatures(viewParams = params, paging = TRUE, paging_length = 1000))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=1.1.0&typeName=Dataportal:eurobis-obisenv_basic&viewParams=where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464&resulttype=hits&request=GetFeature
#> Error in seq.default(from = 0, to = numberMatched, by = paging_length): 'to' must be of length 1
#> Timing stopped at: 0.09 0.001 0.678

system.time(feature_parallel <- wfs$getFeatures(viewParams = params, resultType="results", 
                                                parallel = TRUE, parallel_handler = parallel::mclapply, cl = cl))
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=1.1.0&typeName=Dataportal:eurobis-obisenv_basic&viewParams=where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464&resultType=results&request=GetFeature
#>    user  system elapsed 
#>   0.986   0.088   3.429

# Debugging pagination
nft <- wfs$getFeatures(viewParams = params, resultType="hits")
#> [ows4R][INFO] WFSGetFeature - Fetching https://geo.vliz.be/geoserver/Dataportal/wfs?service=WFS&version=1.1.0&typeName=Dataportal:eurobis-obisenv_basic&viewParams=where%3A%28%28up.geoobjectsids+%26%26+ARRAY%5B2350%5D%29%29+AND+datasetid+IN+%28216%29%3Bcontext%3A0100%3Baphiaid%3A104464&resultType=hits&request=GetFeature
names(nft)
#> [1] "numberOfFeatures" "timeStamp"

"numberMatched" %in% names(nft)
#> [1] FALSE

sessionInfo()
#> R version 3.6.3 (2020-02-29)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.6 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>   [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
#> [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
#> [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
#> [10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   
#> 
#> attached base packages:
#>   [1] parallel  stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#>   [1] httr_1.4.2    reprex_2.0.1  ows4R_0.2-1   keyring_1.3.0 geometa_0.6-6
#> 
#> loaded via a namespace (and not attached):
#>   [1] tinytex_0.35       tidyselect_1.1.1   xfun_0.28          purrr_0.3.4       
#> [5] sf_0.9-4           lattice_0.20-41    vctrs_0.3.8        generics_0.1.0    
#> [9] htmltools_0.5.0    yaml_2.2.1         utf8_1.2.2         XML_3.99-0.3      
#> [13] rlang_0.4.11       e1071_1.7-3        pillar_1.6.3       glue_1.4.2        
#> [17] withr_2.4.2        DBI_1.1.1          bit64_4.0.5        sp_1.4-6          
#> [21] lifecycle_1.0.1    evaluate_0.14      knitr_1.29         tzdb_0.1.2        
#> [25] callr_3.7.0        ps_1.6.0           curl_4.3           class_7.3-17      
#> [29] fansi_0.5.0        highr_0.8          Rcpp_1.0.7         readr_2.0.2       
#> [33] KernSmooth_2.23-17 openssl_1.4.2      classInt_0.4-3     vroom_1.5.5       
#> [37] jsonlite_1.7.0     bit_4.0.4          fs_1.5.0           hms_1.1.1         
#> [41] askpass_1.1        digest_0.6.25      processx_3.5.2     dplyr_1.0.7       
#> [45] grid_3.6.3         rgdal_1.5-12       cli_3.0.1          tools_3.6.3       
#> [49] magrittr_2.0.1     tibble_3.1.5       crayon_1.4.1       pkgconfig_2.0.3   
#> [53] ellipsis_0.3.2     assertthat_0.2.1   rmarkdown_2.11     rstudioapi_0.13   
#> [57] R6_2.5.1           units_0.6-7        compiler_3.6.3   

Created on 2022-03-29 by the reprex package (v2.0.1)

This issue partly follows up #29

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions