Closed
Description
Downloading tables with bq_download_table(api = "arrow")
instead of bq_download_table(api = "json")
for large-size tables, the Arrow table streams slower than fetching and parsing JSON.
The example below requires bigquery-public-data.usa_names.usa_1910_current
table to be copied to your project.
library(bigrquerystorage)
library(bigrquery)
library(tictoc)
billing <- Sys.getenv("GCP_BILLING_PROJECT_ID")
tic()
bigquery_storage_api_rows <-
bq_project_query(
billing,
"
select name, number, state
from usa_names.usa_1910_current
where state = 'WA'
"
) |>
bq_table_download(api = "arrow")
toc()
#> Job complete
#> Billed: 125.83 MB
#> Streamed 130809 rows in 32 messages.
#> 82.522 sec elapsed
tic()
bigquery_api_rows <-
bq_project_query(
billing,
"
select name, number, state
from usa_names.usa_1910_current
where state = 'WA'
"
) |>
bq_table_download(api = "json")
toc()
#> Job complete
#> Billed: 0 B
#> Downloading first chunk of data.
#> First chunk includes all requested rows.
#> 3.953 sec elapsed
Metadata
Metadata
Assignees
Labels
No labels