Description
Hello all.
I'm trying to export queried data from a BigQuery database table. Since the resulting table can be large (2.5GB or more), I followed the suggestion "Larger datasets" from the bq_table_download()
help, and I used bq_table_save()
to save the data in multiple files in Google Cloud Storage.
When I tried to apply bq_table_save()
, I discovered an undocumented option to export the files: destination_format = "PARQUET"
in place of "NEWLINE_DELIMITED_JSON"
or "CSV"
. If I use this parameter, bq_table_save()
saves correctly the data in multiple "parquet" files.
Can I use this option without problems? It seems to me that it works very well: it is very performant, and the use of parquet files saves me a lot of work to check data types.
The following code summarizes at most the code I used to export data succesfully to a Google Cloud Storage bucket:
project_id <- "<project identifier>"
sql_dwn <- "SELECT * FROM <table from which to extract data>"
tb <- bq_project_query(project_id, sql_dwn)
bq_table_save(tb, destination_uris = "destination_bucket/folder/filename_*.parquet", destination_format="PARQUET")
Thank you in advance for your help.