Wanted to start a discussion here...
According to Big Query documentation, Avro is the preferred format for loading into Big Query from GCS. It seems like we could see improvements to load times using this format instead of .csv files.
A toavro() method exists in the PETL library so we can just expose that as well.
Looking at the code, it seems like we could probably add a flag in the bq.copy() method as to the preferred file format as to keep this from being a breaking change. Want to hear from folks if there are any thoughts and if it is a feasible project?
Wanted to start a discussion here...
According to Big Query documentation, Avro is the preferred format for loading into Big Query from GCS. It seems like we could see improvements to load times using this format instead of .csv files.
A
toavro()method exists in the PETL library so we can just expose that as well.Looking at the code, it seems like we could probably add a flag in the bq.copy() method as to the preferred file format as to keep this from being a breaking change. Want to hear from folks if there are any thoughts and if it is a feasible project?