BigQuery writes with GenericRecord format don't support overridden non-String types

Any BQT class that uses OverrideTypeProvider to wrap a non-String type will fail when GenericRecord format is used to write records to BQ. For example:

```scala
// sample of override type provider
object NonNegativeInt {
  def parse(data: String): NonNegativeInt = NonNegativeInt(data.toInt)

  def stringType: String = "NONNEGATIVEINT"

  def bigQueryType: String = "INTEGER"
}
object Index {
  def getIndexCompileTimeTypes(c: blackbox.Context): mutable.Map[c.Type, Class[_]] = {
    import c.universe._
    mutable.Map[Type, Class[_]](
      typeOf[NonNegativeInt] -> classOf[NonNegativeInt]
    )
  }
  def getIndexClass: mutable.Map[String, Class[_]] =
    mutable.Map[String, Class[_]](
      NonNegativeInt.stringType -> classOf[NonNegativeInt]
    )
  def getIndexRuntimeTypes: mutable.Map[Type, Class[_]] =
    mutable.Map[Type, Class[_]](
      typeOf[NonNegativeInt] -> classOf[NonNegativeInt]
    )
}

// sample of job
@BigQueryType.ToTable
case class MyRecord(i: NonNegativeInt)

sc
  .parallelize(1 to 10)
  .map(MyRecord(NonNegativeInt(i))
  .saveAsTypedBigQueryTable(...)
```

will fail with a class cast exception like:

```[info]   Cause: org.apache.avro.file.DataFileWriter$AppendWriteException: java.lang.ClassCastException: value 7 (a com.spotify.scio.bigquery.validation.NonNegativeInt) cannot be cast to expected type long at RecordWithNonStringOverriddenType.nonNegativeInt
[info]   at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:326)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.AvroRowWriter.write(AvroRowWriter.java:58)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.processElement(WriteBundlesToFiles.java:247)
[info]   ...
[info]   Cause: java.lang.ClassCastException: value 31 (a com.spotify.scio.example.NonNegativeInt) cannot be cast to expected type long at MyRecord.i
[info]   at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:79)
[info]   at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:30)
[info]   at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:84)
[info]   at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:323)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.AvroRowWriter.write(AvroRowWriter.java:58)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.processElement(WriteBundlesToFiles.java:247)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeProcessElement(Unknown Source)
[info]   at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:212)
[info]   at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:186)
[info]   at org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:88)
```

This is because `toAvroInternal` converts all overridden types to String: https://github.com/spotify/scio/blob/v0.14.14/scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/types/ConverterProvider.scala#L174 . I think this was just copied from the `toTableRow` behavior, where it works fine because JSON format supports stringified everything, but Avro is more strict; the converted `avroSchema` correctly expects an Integer value.

The workaround is to fall back to TableRow format.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

BigQuery writes with GenericRecord format don't support overridden non-String types #5644

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

BigQuery writes with GenericRecord format don't support overridden non-String types #5644

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions