Open
Description
While refactoring the rnpn
package to use httr2
, I've discovered that streaming ndjson with req_perform_connection()
and resp_stream_lines()
takes more time and significantly memory compared to using curl
+ jsonlite::stream_in()
—so much so that I'm going to have to revert the change as users are running up against memory limitations. I'm not sure if this is just because of additional overhead due to features of httr2
or if it is something that can be addressed (or possibly I'm doing things wrong!)
For example, a request that uses ~17MB of RAM with curl
+ jsonlite::stream_in()
uses ~1GB of RAM with httr2
.
Full benchmark code:
library(httr2)
library(curl)
#> Using libcurl 8.11.1 with OpenSSL/3.3.2
library(jsonlite)
library(bench)
url <- "https://services.usanpn.org/npn_portal//observations/getSummarizedData.ndjson?"
query <- list(request_src = "benchmarking", climate_data = "0", start_date = "2025-01-01",
end_date = "2025-12-31")
bench::mark(
httr2 = {
req <- httr2::request(url) %>%
httr2::req_method("POST") %>%
httr2::req_body_form(!!!query)
con <- httr2::req_perform_connection(req)
out_httr2 <- tibble::tibble()
while(!httr2::resp_stream_is_complete(con)) {
resp <- httr2::resp_stream_lines(con, lines = 5000)
df <- resp %>%
textConnection() %>%
jsonlite::stream_in(verbose = FALSE, pagesize = 5000)
out_httr2 <- dplyr::bind_rows(out_httr2, df)
}
close(con)
out_httr2
},
curl = {
query2 <- c(query, customrequest = "POST")
h <- new_handle() %>% handle_setform(.list = query2)
con <- curl(url, handle = h)
out_curl <- tibble::tibble()
jsonlite::stream_in(con, function(df) {
#I know this isn't necessary, but in the real code data wrangling happens
#in the callback function
out_curl <<- dplyr::bind_rows(out_curl, df)
}, verbose = FALSE, pagesize = 5000)
out_curl
}
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#> expression min median `itr/sec` mem_alloc `gc/sec`
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl>
#> 1 httr2 14.6s 14.6s 0.0683 1.04GB 1.37
#> 2 curl 14.6s 14.6s 0.0687 17.79MB 0.0687
Created on 2025-03-13 with reprex v2.1.1
Session info
sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.3 (2025-02-28)
#> os macOS Sequoia 15.3.2
#> system x86_64, darwin20
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/Phoenix
#> date 2025-03-13
#> pandoc 3.6.2 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> bench * 1.1.4 2025-01-16 [1] CRAN (R 4.4.1)
#> cli 3.6.4 2025-02-13 [1] CRAN (R 4.4.1)
#> curl * 6.2.1 2025-02-19 [1] CRAN (R 4.4.1)
#> digest 0.6.37 2024-08-19 [1] CRAN (R 4.4.1)
#> dplyr 1.1.4 2023-11-17 [1] CRAN (R 4.4.0)
#> evaluate 1.0.3 2025-01-10 [1] CRAN (R 4.4.1)
#> fastmap 1.2.0 2024-05-15 [1] CRAN (R 4.4.0)
#> fs 1.6.5 2024-10-30 [1] CRAN (R 4.4.1)
#> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0)
#> glue 1.8.0 2024-09-30 [1] CRAN (R 4.4.1)
#> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#> httr2 * 1.1.1 2025-03-08 [1] CRAN (R 4.4.1)
#> jsonlite * 1.8.9 2024-09-20 [1] CRAN (R 4.4.1)
#> knitr 1.49 2024-11-08 [1] CRAN (R 4.4.1)
#> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0)
#> pillar 1.10.1 2025-01-07 [1] CRAN (R 4.4.1)
#> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0)
#> profmem 0.6.0 2020-12-13 [1] CRAN (R 4.4.0)
#> R6 2.6.1 2025-02-15 [1] CRAN (R 4.4.1)
#> rappdirs 0.3.3 2021-01-31 [1] CRAN (R 4.4.0)
#> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.0)
#> rlang 1.1.5 2025-01-17 [1] CRAN (R 4.4.1)
#> rmarkdown 2.29 2024-11-04 [1] CRAN (R 4.4.1)
#> rstudioapi 0.17.1 2024-10-22 [1] CRAN (R 4.4.1)
#> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0)
#> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0)
#> withr 3.0.2 2024-10-28 [1] CRAN (R 4.4.1)
#> xfun 0.50 2025-01-07 [1] CRAN (R 4.4.1)
#> yaml 2.3.10 2024-07-26 [1] CRAN (R 4.4.0)
#>
#> [1] /Users/ericscott/Library/R/x86_64/4.4/library
#> [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
Metadata
Metadata
Assignees
Labels
No labels