- 
                Notifications
    
You must be signed in to change notification settings  - Fork 527
 
Open
Labels
bugSomething isn't workingSomething isn't working
Description
Any BQT class that uses OverrideTypeProvider to wrap a non-String type will fail when GenericRecord format is used to write records to BQ. For example:
// sample of override type provider
object NonNegativeInt {
  def parse(data: String): NonNegativeInt = NonNegativeInt(data.toInt)
  def stringType: String = "NONNEGATIVEINT"
  def bigQueryType: String = "INTEGER"
}
object Index {
  def getIndexCompileTimeTypes(c: blackbox.Context): mutable.Map[c.Type, Class[_]] = {
    import c.universe._
    mutable.Map[Type, Class[_]](
      typeOf[NonNegativeInt] -> classOf[NonNegativeInt]
    )
  }
  def getIndexClass: mutable.Map[String, Class[_]] =
    mutable.Map[String, Class[_]](
      NonNegativeInt.stringType -> classOf[NonNegativeInt]
    )
  def getIndexRuntimeTypes: mutable.Map[Type, Class[_]] =
    mutable.Map[Type, Class[_]](
      typeOf[NonNegativeInt] -> classOf[NonNegativeInt]
    )
}
// sample of job
@BigQueryType.ToTable
case class MyRecord(i: NonNegativeInt)
sc
  .parallelize(1 to 10)
  .map(MyRecord(NonNegativeInt(i))
  .saveAsTypedBigQueryTable(...)will fail with a class cast exception like:
[info]   at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:326)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.AvroRowWriter.write(AvroRowWriter.java:58)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.processElement(WriteBundlesToFiles.java:247)
[info]   ...
[info]   Cause: java.lang.ClassCastException: value 31 (a com.spotify.scio.example.NonNegativeInt) cannot be cast to expected type long at MyRecord.i
[info]   at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:79)
[info]   at org.apache.avro.path.TracingClassCastException.summarize(TracingClassCastException.java:30)
[info]   at org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:84)
[info]   at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:323)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.AvroRowWriter.write(AvroRowWriter.java:58)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles.processElement(WriteBundlesToFiles.java:247)
[info]   at org.apache.beam.sdk.io.gcp.bigquery.WriteBundlesToFiles$DoFnInvoker.invokeProcessElement(Unknown Source)
[info]   at org.apache.beam.runners.core.SimpleDoFnRunner.invokeProcessElement(SimpleDoFnRunner.java:212)
[info]   at org.apache.beam.runners.core.SimpleDoFnRunner.processElement(SimpleDoFnRunner.java:186)
[info]   at org.apache.beam.runners.core.SimplePushbackSideInputDoFnRunner.processElementInReadyWindows(SimplePushbackSideInputDoFnRunner.java:88)
This is because toAvroInternal converts all overridden types to String: https://github.com/spotify/scio/blob/v0.14.14/scio-google-cloud-platform/src/main/scala/com/spotify/scio/bigquery/types/ConverterProvider.scala#L174 . I think this was just copied from the toTableRow behavior, where it works fine because JSON format supports stringified everything, but Avro is more strict; the converted avroSchema correctly expects an Integer value.
The workaround is to fall back to TableRow format.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working