Skip to content

Downloading tables takes longer when api = "arrow" #622

Closed
@botan

Description

@botan

Downloading tables with bq_download_table(api = "arrow") instead of bq_download_table(api = "json") for large-size tables, the Arrow table streams slower than fetching and parsing JSON.

The example below requires bigquery-public-data.usa_names.usa_1910_current table to be copied to your project.

library(bigrquerystorage)
library(bigrquery)
library(tictoc)

billing <- Sys.getenv("GCP_BILLING_PROJECT_ID")

tic()
bigquery_storage_api_rows <-
  bq_project_query(
    billing,
    "
    select name, number, state
      from usa_names.usa_1910_current
     where state = 'WA'
    "
  ) |> 
  bq_table_download(api = "arrow")
toc()
#> Job complete
#> Billed: 125.83 MB
#> Streamed 130809 rows in 32 messages. 
#> 82.522 sec elapsed

tic()
bigquery_api_rows <-
  bq_project_query(
    billing,
    "
    select name, number, state
      from usa_names.usa_1910_current
     where state = 'WA'
    "
  ) |> 
  bq_table_download(api = "json")
toc()
#> Job complete
#> Billed: 0 B
#> Downloading first chunk of data.
#> First chunk includes all requested rows.
#> 3.953 sec elapsed

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions