Added uploadTypedRows to BigQuery client #5218

shnapz · 2024-01-30T21:54:25Z

writeTypedRows is not always suitable. It is using GCP's insertAll. Some GCP APIs do not reflect recent loads this way, like:
table.getNumRows() returns 0
Old post.
Data loaded as upload from file does not cause the same problem.

codecov · 2024-01-30T22:21:49Z

Codecov Report

Attention: 20 lines in your changes are missing coverage. Please review.

Comparison is base (5c112ff) 62.63% compared to head (cb0cb05) 62.53%.
Report is 9 commits behind head on main.

Files	Patch %	Lines
...ala/com/spotify/scio/bigquery/client/LoadOps.scala	0.00%	20 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #5218      +/-   ##
==========================================
- Coverage   62.63%   62.53%   -0.10%     
==========================================
  Files         301      301              
  Lines       10845    10867      +22     
  Branches      768      744      -24     
==========================================
+ Hits         6793     6796       +3     
- Misses       4052     4071      +19

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

shnapz · 2024-01-30T22:44:07Z

build.sbt

@@ -83,7 +83,7 @@ val googleCloudDatastoreVersion = "0.108.6"
 val googleCloudMonitoringVersion = "3.32.0"
 val googleCloudPubSubVersion = "1.107.13"
 val googleCloudSpannerVersion = "6.55.0"
-val googleCloudStorageVersion = "2.30.1"
+val googleCloudStorageVersion = "2.26.0"


in sync with Beam

Your link points to GCP libraries-bom 26.22.0 but beam 2.53 uses 26.28.0 here.

Right, reverting this

scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/client/LoadOps.scala

RustedBones · 2024-02-01T09:19:41Z

scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/client/LoadOps.scala

+  def uploadTypedRows[T <: HasAnnotation: TypeTag](
+    tableSpec: String,
+    rows: List[T],
+    tempLocation: String,


Since this is temp, shouldn't we clean if afterward ?

I was thinking about bucket retention policy, but yeah, deleting it would be more optimal

on the other hand if I do:

avro( List(blobId.toGsUtilUri), tableSpec, schema = Some(bqt.schema), createDisposition = createDisposition, writeDisposition = writeDisposition ) storage.delete(blobId)

I am not confident that avro is fully synchronous and BQ doesn't read that file on the background

execute contains jobService.waitForJobs(loadJob) so I think this is fine.

I'm wondering also if we should create a SaveOps to allow saving some file formats (avro/json)

RustedBones · 2024-02-09T09:03:00Z

scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/client/LoadOps.scala

+   * Upload List of rows to Cloud Storage as Avro file and load to BigQuery table. Note that element
+   * type `T` must be annotated with [[BigQueryType]].
+   */
+  def uploadTypedRows[T <: HasAnnotation: TypeTag](


IMHO this naming can be simplified. Other APIs here do not have the upload prefix.
Usage will be from BigQuery with bq.load.uploadTypeRows.

I think this should be named

Suggested change

def uploadTypedRows[T <: HasAnnotation: TypeTag](

def typedRows[T <: HasAnnotation: TypeTag](

shnapz added 4 commits January 30, 2024 16:54

Added uploadTypedRows to BigQuery client

9f91e0d

remove unused imports

9960886

fix 2

17dcd89

fix 3

9d4e9b6

Fix sbt

a2dbae0

shnapz commented Jan 30, 2024

View reviewed changes

shnapz requested a review from clairemcginty January 30, 2024 23:32

clairemcginty reviewed Jan 31, 2024

View reviewed changes

scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/client/LoadOps.scala Show resolved Hide resolved

Revert build.sbt

08e180c

shnapz requested review from RustedBones and clairemcginty January 31, 2024 18:36

RustedBones reviewed Feb 1, 2024

View reviewed changes

Delete temporary file from GCS

cb0cb05

clairemcginty approved these changes Feb 1, 2024

View reviewed changes

shnapz requested a review from RustedBones February 8, 2024 23:03

RustedBones reviewed Feb 9, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added uploadTypedRows to BigQuery client #5218

Added uploadTypedRows to BigQuery client #5218

shnapz commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading

shnapz Jan 30, 2024

RustedBones Jan 31, 2024

shnapz Jan 31, 2024

RustedBones Feb 1, 2024

shnapz Feb 1, 2024

shnapz Feb 1, 2024

RustedBones Feb 9, 2024

RustedBones Feb 9, 2024

	def uploadTypedRows[T <: HasAnnotation: TypeTag](
	def typedRows[T <: HasAnnotation: TypeTag](

Added uploadTypedRows to BigQuery client #5218

Are you sure you want to change the base?

Added uploadTypedRows to BigQuery client #5218

Conversation

shnapz commented Jan 30, 2024 • edited Loading

codecov bot commented Jan 30, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

shnapz commented Jan 30, 2024 •

edited

Loading

codecov bot commented Jan 30, 2024 •

edited

Loading