Skip to content

Exporting data to Google Cloud Storage in Parquet format available but undocumented #614

Open
@pegoenrico

Description

@pegoenrico

Hello all.
I'm trying to export queried data from a BigQuery database table. Since the resulting table can be large (2.5GB or more), I followed the suggestion "Larger datasets" from the bq_table_download() help, and I used bq_table_save() to save the data in multiple files in Google Cloud Storage.

When I tried to apply bq_table_save(), I discovered an undocumented option to export the files: destination_format = "PARQUET" in place of "NEWLINE_DELIMITED_JSON" or "CSV". If I use this parameter, bq_table_save() saves correctly the data in multiple "parquet" files.

Can I use this option without problems? It seems to me that it works very well: it is very performant, and the use of parquet files saves me a lot of work to check data types.

The following code summarizes at most the code I used to export data succesfully to a Google Cloud Storage bucket:

project_id  <- "<project identifier>"
sql_dwn <- "SELECT * FROM <table from which to extract data>"
tb <- bq_project_query(project_id, sql_dwn)
bq_table_save(tb, destination_uris = "destination_bucket/folder/filename_*.parquet", destination_format="PARQUET") 

Thank you in advance for your help.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions