Add timestamp precision option to bigquery storage read for TIMESTAMP(12) columns. #37079

claudevdm · 2025-12-11T02:05:52Z

Add read timestamp precision setting for storage api reads.

The storage API allows reading TIMESTAMP(12) columns with MICRO (default), NANOS or PICOS precision for both AVRO and ARROW formats.

This propagates the read precision setting to the storage API, and adds relevant tests.

Known Issue:
Arrow readTableRows and readTableRowsWithSchema converts arrow records to beam rows via ArrowConversion.java.

ArrowConversion is a generic utility for arrow -> beam schema, it does not take into account the bigquery schema.

Even before this PR, arrow format with readTableRows truncates timestamps to millisecond precision because millis and micro timestamps were historically mapped to FieldType.DATETIME

beam/sdks/java/extensions/arrow/src/main/java/org/apache/beam/sdk/extensions/arrow/ArrowConversion.java

Line 210 in 15b50e2

return FieldType.DATETIME;

For avro readTableRowsWithSchema this is not an issue because we can map timestamp-micros to timestamp logical type if the bigquery schema is TIMESTAMP(12) with read precision micros

beam/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryUtils.java

Line 415 in 15b50e2

if ((schema.getTimestampPrecision() == null)

Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
Update CHANGES.md with noteworthy changes.
If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

To check the build health, please visit https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md

GitHub Actions Tests Status (on master branch)

See CI.md for more information about GitHub Actions CI or the workflows README to see a list of phrases to trigger workflows.

gemini-code-assist · 2025-12-11T02:06:16Z

Summary of Changes

Hello @claudevdm, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the BigQueryIO connector by introducing explicit control over timestamp precision when reading data using the BigQuery Storage Read API. This allows users to specify whether TIMESTAMP(12) columns should be read with nanosecond or picosecond precision, ensuring data fidelity for high-precision timestamp values. The changes involve adding a new configuration option and integrating it into the BigQuery Storage API request generation, along with thorough testing to cover various data formats and precision levels.

Highlights

Timestamp Precision Configuration: Introduced a new withTimestampPrecision option in BigQueryIO.TypedRead to allow users to specify the desired precision for TIMESTAMP(12) columns when reading via the BigQuery Storage Read API. This option supports nanosecond and picosecond precision.
BigQuery Storage API Integration: Modified internal BigQuery Storage Source classes (BigQueryStorageQuerySource, BigQueryStorageTableSource, BigQueryStorageSourceBase) to propagate and apply the TimestampPrecision setting to the underlying BigQuery Storage Read Session requests.
Serialization Options for Precision: The implementation now explicitly configures ArrowSerializationOptions and AvroSerializationOptions to request TIMESTAMP_PRECISION_NANOS or TIMESTAMP_PRECISION_PICOS from the BigQuery Storage API, ensuring that the specified precision is honored during data retrieval.
Comprehensive Testing: Added extensive unit tests to validate the new TimestampPrecision functionality across different data formats (Avro, Arrow) and read methods (readTableRows, readTableRowsWithSchema), covering default, micros, nanos, and picos precision scenarios.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

claudevdm · 2025-12-11T02:22:00Z

/gemini review

gemini-code-assist

Code Review

This pull request introduces a new option to specify timestamp precision when reading from BigQuery using the Storage Read API. The changes are well-implemented across the relevant classes, and the addition of comprehensive tests is excellent. I have a couple of suggestions: one to correct the default value mentioned in a Javadoc, and another to refactor a new method to improve its structure and reduce code duplication.

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java

...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java

claudevdm · 2025-12-11T17:57:34Z

R: @Abacn

claudevdm · 2025-12-11T17:57:41Z

R: @ahmedabu98

github-actions · 2025-12-11T17:58:43Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

github-actions · 2025-12-11T17:58:48Z

Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control. If you'd like to restart, comment assign set of reviewers

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java

damccorm · 2025-12-15T20:54:52Z

/gemini review

damccorm

Mostly LGTM, just had a naming question

damccorm · 2025-12-15T20:53:28Z

...d-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageQuerySource.java

+      SerializableFunction<SchemaAndRecord, T> parseFn,
+      Coder<T> outputCoder,
+      BigQueryServices bqServices,
+      @Nullable TimestampPrecision picosTimestampPrecision) {


Is there a reason we're naming this picosTimestampPrecision? Couldn't it take in non-picos precisions?

gemini-code-assist

Code Review

This pull request introduces a valuable feature for controlling timestamp precision during BigQuery storage reads for TIMESTAMP(12) columns. The implementation is well-structured, propagating the new option from the user-facing API down to the storage read session creation. The accompanying tests are comprehensive and cover a wide range of scenarios. I've identified one critical issue regarding serialization that could break portability, along with a few medium-severity suggestions to improve code robustness and maintainability. Overall, this is a solid contribution.

gemini-code-assist · 2025-12-15T21:08:11Z

...-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOTranslation.java

+        fieldValues.put(
+            "timestamp_precision", toByteArray(transform.getDirectReadPicosTimestampPrecision()));


There's a mismatch between the field name used here for serialization (timestamp_precision) and the name defined in the schema and used for deserialization (direct_read_picos_timestamp_precision). This will cause portability issues as the serialized value won't be correctly read back. Please use the correct field name to ensure the transform can be correctly serialized and deserialized across different contexts.

Suggested change

fieldValues.put(

"timestamp_precision", toByteArray(transform.getDirectReadPicosTimestampPrecision()));

fieldValues.put(

"direct_read_picos_timestamp_precision", toByteArray(transform.getDirectReadPicosTimestampPrecision()));

gemini-code-assist · 2025-12-15T21:08:11Z

...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java

+    switch (checkNotNull(picosTimestampPrecision)) {
+      case MICROS:
+        precision = ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_MICROS;
+        break;
+      case NANOS:
+        precision = ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS;
+        break;
+      case PICOS:
+        precision = ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS;
+        break;
+      default:
+        return;
+    }


This method can be slightly improved for clarity and robustness:

The checkNotNull(picosTimestampPrecision) is redundant because picosTimestampPrecision is checked for null in the calling method setPicosTimestampPrecision.

The switch statement is not exhaustive for the TimestampPrecision enum (it's missing MILLIS). The default case currently returns silently, which could hide bugs if new enum values are added in the future. It would be safer to explicitly handle MILLIS (if it's intentionally unsupported) and have a default case that throws an exception for any other unhandled values.

switch (picosTimestampPrecision) { case MICROS: precision = ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_MICROS; break; case NANOS: precision = ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS; break; case PICOS: precision = ArrowSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS; break; case MILLIS: // The Storage Read API does not support millisecond precision for TIMESTAMP(12) columns. return; default: throw new IllegalArgumentException( "Unsupported timestamp precision for Storage Read API: " + picosTimestampPrecision); }

gemini-code-assist · 2025-12-15T21:08:11Z

...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java

+    switch (checkNotNull(picosTimestampPrecision)) {
+      case MICROS:
+        precision = AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_MICROS;
+        break;
+      case NANOS:
+        precision = AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS;
+        break;
+      case PICOS:
+        precision = AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS;
+        break;
+      default:
+        return;
+    }


Similar to setArrowTimestampPrecision, this method can be improved by removing the redundant checkNotNull and making the switch statement more robust against future changes to the TimestampPrecision enum by handling all cases explicitly or throwing an exception for unhandled ones.

switch (picosTimestampPrecision) { case MICROS: precision = AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_MICROS; break; case NANOS: precision = AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_NANOS; break; case PICOS: precision = AvroSerializationOptions.PicosTimestampPrecision.TIMESTAMP_PRECISION_PICOS; break; case MILLIS: // The Storage Read API does not support millisecond precision for TIMESTAMP(12) columns. return; default: throw new IllegalArgumentException( "Unsupported timestamp precision for Storage Read API: " + picosTimestampPrecision); }

gemini-code-assist · 2025-12-15T21:08:11Z

...ud-platform/src/test/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIOStorageReadTest.java

+    switch (precision) {
+      case NANOS:
+        return ARROW_SCHEMA_TS_NANOS;
+      case PICOS:
+        return ARROW_SCHEMA_TS_PICOS;
+      case MICROS:
+      default:
+        return ARROW_SCHEMA_TS_MICROS;
+    }


The default case here will treat any unhandled TimestampPrecision enum value as MICROS. This could lead to subtle bugs if new precisions are added. It would be safer to make the switch statement exhaustive or throw an exception in the default case to ensure all enum values are explicitly handled. This applies to other similar helper methods in this test file as well (e.g., getAvroSchemaTs, createArrowTsResponse).

switch (precision) { case NANOS: return ARROW_SCHEMA_TS_NANOS; case PICOS: return ARROW_SCHEMA_TS_PICOS; case MICROS: return ARROW_SCHEMA_TS_MICROS; default: throw new IllegalArgumentException("Unsupported timestamp precision: " + precision); }

initial

ef6ccb6

github-actions bot added java io gcp labels Dec 11, 2025

claudevdm changed the title ~~initial~~ Add timestamp precision option to bigquery storage read for TIMESTAMP(12) columns. Dec 11, 2025

gemini-code-assist bot reviewed Dec 11, 2025

View reviewed changes

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java Outdated Show resolved Hide resolved

...ud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryStorageSourceBase.java Outdated Show resolved Hide resolved

claudevdm added 2 commits December 11, 2025 10:49

fix tests.

bb7d0db

Fix test.

2fc94eb

claudevdm marked this pull request as ready for review December 11, 2025 17:57

claudevdm requested review from Abacn and ahmedabu98 December 11, 2025 17:57

claudevdm commented Dec 11, 2025

View reviewed changes

...a/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java Outdated Show resolved Hide resolved

Make setting name more descriptive.

5084dd3

claudevdm requested a review from damccorm December 12, 2025 21:58

damccorm reviewed Dec 15, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 15, 2025

View reviewed changes

		fieldValues.put(
		"timestamp_precision", toByteArray(transform.getDirectReadPicosTimestampPrecision()));

Add timestamp precision option to bigquery storage read for TIMESTAMP(12) columns. #37079

Are you sure you want to change the base?

Add timestamp precision option to bigquery storage read for TIMESTAMP(12) columns. #37079

Conversation

claudevdm commented Dec 11, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

GitHub Actions Tests Status (on master branch)

Uh oh!

gemini-code-assist bot commented Dec 11, 2025

Summary of Changes

Highlights

Footnotes

Uh oh!

claudevdm commented Dec 11, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

claudevdm commented Dec 11, 2025

Uh oh!

claudevdm commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

github-actions bot commented Dec 11, 2025

Uh oh!

Uh oh!

damccorm commented Dec 15, 2025

Uh oh!

damccorm left a comment

Choose a reason for hiding this comment

Uh oh!

damccorm Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 15, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

claudevdm commented Dec 11, 2025 •

edited

Loading