Skip to content

Performance of streaming requests #704

Open
@Aariq

Description

@Aariq

While refactoring the rnpn package to use httr2, I've discovered that streaming ndjson with req_perform_connection() and resp_stream_lines() takes more time and significantly memory compared to using curl + jsonlite::stream_in()—so much so that I'm going to have to revert the change as users are running up against memory limitations. I'm not sure if this is just because of additional overhead due to features of httr2 or if it is something that can be addressed (or possibly I'm doing things wrong!)

For example, a request that uses ~17MB of RAM with curl + jsonlite::stream_in() uses ~1GB of RAM with httr2.

Full benchmark code:

library(httr2)
library(curl)
#> Using libcurl 8.11.1 with OpenSSL/3.3.2
library(jsonlite)
library(bench)

url <- "https://services.usanpn.org/npn_portal//observations/getSummarizedData.ndjson?"
query <- list(request_src = "benchmarking", climate_data = "0", start_date = "2025-01-01", 
              end_date = "2025-12-31")

bench::mark(
  httr2 = {
    req <- httr2::request(url) %>%
      httr2::req_method("POST") %>%
      httr2::req_body_form(!!!query)
    
    con <- httr2::req_perform_connection(req)
    out_httr2 <- tibble::tibble()
    
    while(!httr2::resp_stream_is_complete(con)) {
      resp <- httr2::resp_stream_lines(con, lines = 5000)
      df <- resp %>% 
        textConnection() %>% 
        jsonlite::stream_in(verbose = FALSE, pagesize = 5000)
      out_httr2 <- dplyr::bind_rows(out_httr2, df)
    }
    close(con)
    out_httr2
  },
  
  curl = {
    query2 <- c(query, customrequest = "POST")
    h <- new_handle() %>% handle_setform(.list = query2)
    
    con <- curl(url, handle = h)
    out_curl <- tibble::tibble()
    
    jsonlite::stream_in(con, function(df) {
      #I know this isn't necessary, but in the real code data wrangling happens
      #in the callback function
      out_curl <<- dplyr::bind_rows(out_curl, df) 
    }, verbose = FALSE, pagesize = 5000)
    out_curl
  }
)
#> Warning: Some expressions had a GC in every iteration; so filtering is
#> disabled.
#> # A tibble: 2 × 6
#>   expression      min   median `itr/sec` mem_alloc `gc/sec`
#>   <bch:expr> <bch:tm> <bch:tm>     <dbl> <bch:byt>    <dbl>
#> 1 httr2         14.6s    14.6s    0.0683    1.04GB   1.37  
#> 2 curl          14.6s    14.6s    0.0687   17.79MB   0.0687

Created on 2025-03-13 with reprex v2.1.1

Session info

sessioninfo::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.3 (2025-02-28)
#>  os       macOS Sequoia 15.3.2
#>  system   x86_64, darwin20
#>  ui       X11
#>  language (EN)
#>  collate  en_US.UTF-8
#>  ctype    en_US.UTF-8
#>  tz       America/Phoenix
#>  date     2025-03-13
#>  pandoc   3.6.2 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package     * version date (UTC) lib source
#>  bench       * 1.1.4   2025-01-16 [1] CRAN (R 4.4.1)
#>  cli           3.6.4   2025-02-13 [1] CRAN (R 4.4.1)
#>  curl        * 6.2.1   2025-02-19 [1] CRAN (R 4.4.1)
#>  digest        0.6.37  2024-08-19 [1] CRAN (R 4.4.1)
#>  dplyr         1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
#>  evaluate      1.0.3   2025-01-10 [1] CRAN (R 4.4.1)
#>  fastmap       1.2.0   2024-05-15 [1] CRAN (R 4.4.0)
#>  fs            1.6.5   2024-10-30 [1] CRAN (R 4.4.1)
#>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
#>  glue          1.8.0   2024-09-30 [1] CRAN (R 4.4.1)
#>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
#>  httr2       * 1.1.1   2025-03-08 [1] CRAN (R 4.4.1)
#>  jsonlite    * 1.8.9   2024-09-20 [1] CRAN (R 4.4.1)
#>  knitr         1.49    2024-11-08 [1] CRAN (R 4.4.1)
#>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
#>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
#>  pillar        1.10.1  2025-01-07 [1] CRAN (R 4.4.1)
#>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
#>  profmem       0.6.0   2020-12-13 [1] CRAN (R 4.4.0)
#>  R6            2.6.1   2025-02-15 [1] CRAN (R 4.4.1)
#>  rappdirs      0.3.3   2021-01-31 [1] CRAN (R 4.4.0)
#>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.0)
#>  rlang         1.1.5   2025-01-17 [1] CRAN (R 4.4.1)
#>  rmarkdown     2.29    2024-11-04 [1] CRAN (R 4.4.1)
#>  rstudioapi    0.17.1  2024-10-22 [1] CRAN (R 4.4.1)
#>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
#>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
#>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
#>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
#>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
#>  withr         3.0.2   2024-10-28 [1] CRAN (R 4.4.1)
#>  xfun          0.50    2025-01-07 [1] CRAN (R 4.4.1)
#>  yaml          2.3.10  2024-07-26 [1] CRAN (R 4.4.0)
#> 
#>  [1] /Users/ericscott/Library/R/x86_64/4.4/library
#>  [2] /Library/Frameworks/R.framework/Versions/4.4-x86_64/Resources/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions