Skip to content

Releases: GoogleCloudPlatform/DataflowTemplates

Dataflow Templates 2023-01-17-01_RC00

20 Jan 21:00
Compare
Choose a tag to compare

Release Week of 2023-01-17

Note: This release also includes changes from the release 2023-01-10-00_RC00, which was cancelled. If you're looking for a version that includes a bugfix from 2023-01-10-00_RC00, please use the latest version 2023-01-17-01_RC00 instead.

Improvements

  • [Spanner Change Stream Templates] Support import/export for change streams in Cloud Spanner PostgreSQL-dialect databases.
  • Added JDBC and Spanner sinks to StreamingDataGenerator template.
  • A number of integration tests and resource managers added.

Bug Fixes

  • [Datastream Templates] A fix for a bug sometimes causing duplicated records to be written to the target database.

Contributors

@Abacn
@nancyxu123
@bvolpato
@pranavbhandari24
@nirfi
@Polber

Dataflow Templates 2023-01-10-00_RC00

13 Jan 19:32
Compare
Choose a tag to compare

Release Week of 2023-01-10

Note: This release has been cancelled and hasn't been fully rolled out to production. Please use version 2023-01-17-01_RC00 or later instead, which includes these changes.

Improvements

  • [Integration Tests] Create Elasticsearch Resource Manager and Create GCS to Elasticsearch integration test
  • [Documentation] Improve documentation for Dataflow CSV import pipeline trailing delimiter.
  • [Flex Templates] Add SecretManagerUtils in v2 common directory

Bug Fixes

  • [Security] Bump postgresql dependency due to CVE-2022-21724 and CVE-2022-31197
  • [Spanner Tests] Fix an invalid column default value in RandomDdlGenerator.

Contributors

@andreigurau
@bvolpato
@oleg-semenov

Dataflow Templates 2023-01-03-00_RC00

10 Jan 18:58
Compare
Choose a tag to compare

Release Week of 2023-01-03

Improvements

[TextIOToBigQuery] Integration test for TextIOToBigQuery flex template
[All Templates] Enable available Nashorn engine ES6 support for Teleport templates
[Syndeo] An integration test for the Kafka-to-BQ flow in Syndeo
[StreamingDataGenerator] Add schema templates to StreamingDataGenerator dataflow template
[Flex Templates] Use structured logging for uncaught exceptions, to log them with ERROR severity/level.
[Flex Templates] Refactor v2 to remove ValueProviders
[Templates in googlecloud-to-googlecloud] Improve dependency management

Bug Fixes

[Security] Bump commons-beanutils dependency due to CVE-2019-10086
[Security] Bump spring-expression due to CVE-2022-22950
[Security] Bump commons-configuration2 dependency due to CVE-2022-33980
[Integration Tests] Build plugin dependencies using a specific target folder to prevent ClassLoader issues
[Spanner Templates] Unit test fixes
[Templates Plugin] Improve plugins documentation + sanitize bucket name arguments
[MongoDbToBigQuery] Integration test fixes
[Datastore Templates] Do not use @default for Firestore Workers, as it overrides Datastore options

Contributors

@andreigurau
@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
@nancyxu123
@ryanmadden-google

Dataflow Templates 2022-12-13-00_RC01

14 Dec 00:45
Compare
Choose a tag to compare

Release Week of 2022-12-13

Note: This release is in the process of rolling out. It may not be in your region yet.

Improvements

  • [BigQuery] Add support to Storage Write API to several templates interacting with BigQuery.
  • [DatastreamToSpanner] Support Postgres as a source
  • [Integration Tests] Added several integration / end-to-end tests and resource managers
  • [PubsubAvroToBigQuery] Upgrade PubsubAvroToBigQuery template to support Storage Write API.
  • [Security] Bump dependencies as a response to CVEs
  • [Templates Plugin] Several improvements to metadata annotations and templates plugin

Bug Fixes

  • [All Templates] Add imperative version for Jackson/FasterXML to match Beam 2.43.0
  • [Changestreams] Fix WriteDataChangeRecordsToAvroTest serialization issue
  • [Flex Templates] Make classpath deterministic for Flex Template executions, by always making sure that Conscrypt is loaded first.

Contributors

@Harwayne
@bvolpato
@oleg-semenov
@pranavbhandari24

Dataflow Templates 2022-12-05-00_RC00

02 Dec 21:45
Compare
Choose a tag to compare

Release Week of 2022-12-05

Improvements

  • [All templates] Introduce metadata annotations.
  • [DataStreamToBigQuery] Expose mergeConcurrency option and re-throw error on merge statement fail.
  • Trigger Java PR workflow when any XML is changed + files are deleted.
  • [Classic templates] Support JSONB arrays.
  • [PubSubCdcToBigQuery] Support maxStreamingBatchSize parameter
  • [DatastreamToSpanner] Add changes for the new HarbourBridge session file with tableID and columnID support.
  • [All templates] Upgrade Beam version to 2.43.
  • [Classic templates] Prepare plugin infra to test classic templates + Create BulkCompressionIT
  • [Integration Tests] Do not make artifactBucket mandatory (only if bucketName not provided for ITs)
  • [DataStreamToSpanner] Change default values for dlqRetryMinutes and dlqMaxRetryCount params.
  • [Integration Tests] Avoid Joiner conflict, and improve plugin staging speed
  • [Flex templates] Plain text logging for Flex Templates unit tests
  • [Integration Tests] Improve plugin bucket parameter requirements
  • [SpannerChangeStreamsTemplates] Simplify the code of setting experiments for spanner change streams to BigQuery and spanner change streams to GCS templates.
  • [Integration Tests] Create MongoDB Resource Manager
  • [Integration Tests] Create MongoDBToBigQueryIntegrationTest
  • [Integration Tests] Add TestContainers framework
  • [Integration Tests] Create PubsubAvroToBigQueryIT + prepare profile to run integration tests together
  • [MongoDBToBigQuery] Create udf for MongoDB BigQuery Templates
  • [Security] Update hadoop version affected by CVE-2022-25168
  • Improve Templates Plugin instructions
  • [Syndeo templates] Separating JSON build

Bug Fixes

  • [Classic templates] WindowedFilenamePolicy's dayPattern defaults to dd instead of DD
  • [JDBC templates] Do not log unencrypted values/keys to the console
  • [DataStream templates] Rethrow exception from ExtractGcsFile so that Dataflow will retry the pardo
  • [Integration Tests] Fix integration tests parameters passing
  • [Flex templates] Fix log dependencies (log4j initialization error)

Contributors

@bvolpato
@oleg-semenov
@pabloem
@Polber
@pranavbhandari24
@theshanbhag

Dataflow Templates 2022-11-16-00_RC00

17 Nov 23:15
Compare
Choose a tag to compare

Release Week of 2022-11-16

New Templates

Text-to-BigQuery Flex template with support for BQ Storage Write API

Improvements

  • [All templates] Upgrade Beam version to 2.42.
  • [Batch Flex template with BQ Sinks] BQ Storage Write API options for Batch Flex Templates
  • [Syndeo] Add Jib to syndeo-template module and bind it to the package phase for the Syndeo template
  • [Syndeo] Handle schemas that are not provided by configuration
  • [Flex templates] Enable structured logging for v2 templates
  • [All templates] Enforce Conscrypt version 2.5.2 (matching Beam 2.42.0)
  • [All templates] Update commons-text
  • [BigTable templates] update bigtable-beam-import version
  • [Spanner templates] Supported new value capture types (NEW_VALUES and NEW_ROW).

Bug Fixes

  • pom file name fix in .github/workflows/prepare-java-cache.yml
  • [Unit tests] Reducing verbosity of unit test logs

Contributors

@bvolpato
@oleg-semenov
@pabloem
@pranavbhandari24
@zhoufek

Dataflow Templates 2022-11-01-00_RC00

03 Nov 14:33
Compare
Choose a tag to compare

Release Week of 2022-11-01

Note: This release is in the process of rolling out. It may not be in your region yet.

Bug Fixes

[CDC] Print exception with "Avro File Read Failure" logs. Fixes #450

Contributors

Bruno Volpato @bvolpato

Dataflow Templates 2022-10-25-00_RC00

26 Oct 21:17
Compare
Choose a tag to compare

Release Week of 2022-10-25

Note: This release is in the process of rolling out. It may not be in your region yet.

Improvements

[Project Structure] The structure of the project has changed to simplify contributing.

  • Move classic templates to their own subdirectory called v1/, this is consistent with flex templates being housed in the v2/ subdirectory.
  • Remove unified-templates.xml and rely solely on the root pom.xml for building the project.

[JdbcToBigQuery Template] Add optional data loading pipeline option to toggle whether data is truncated or appended into BigQuery.

Contributors

Pablo Estrada @pabloem
Suddhasatwa Bhaumik @suddhasatwabhaumik
Bruno Volpato @bvolpato

Dataflow Templates 2022-10-18-00_RC00

25 Oct 15:20
Compare
Choose a tag to compare

Release Week of 2022-10-18

Improvements

[BigTable Templates] Added Bigtable Resource Manager for integration testing
[BigQuery Templates] Added BigQuery Resource Manager for integration testing
[Splunk Template] Enable Splunk batching by default (10) on the Pub/Sub to Splunk template

Bug Fixes

[ElasticSearch Templates] Bug fix for the cases when maxBatchSizeBytes may be exceeded

Contributors

Bruno Volpato @bvolpato
Jeffrey Kinard @Polber
Mark Pevec @ggprod
olegsa @oleg-semenov

Dataflow Templates 2022-09-26-01_RC00

03 Oct 16:03
Compare
Choose a tag to compare

Release Week of 2022-09-26

Improvements

[Spanner Template] Removing CAST to string statement when reading Numeric columns from Spanner.
[All templates] Do not apply formatting/spotless on Apache Beam code

Contributors

@bvolpato
@darshan-sj