Description
What needs to happen?
email thread link: https://lists.apache.org/thread/30ocd9sbdm3268jk1brvrp402xnfmlxp
Beam SQL currently serves with two SQL dialects (i) Apache Calcite and (ii)
ZetaSQL dialects, see documentation [2] due to the following reasons
-
Developments in Beam for ZetaSQL dialect effectively stalled since early
2022 (See change history [3]) -
Despite incomplete support status, there is no new bug / feature request
opened ever since we migrated to use GitHub Issue, suggesting minimal
adoption [4] -
We still need to keep zetasql up-to-date if its dependency conflicts with
other google dependencies, as a result ZetaSQL component introduces
maintenance burden when upgrading GCP-BOM (e.g. [5]). -
One of the main reason that using ZetaSQL dialect, per [2], was because
pipelines that write to or read from BigQuery tables.
As of today, as GCP BigQuery now supports using GoogleSQL (open-sourced
as ZetaSQL) querying data that's stored outside of BigQuery via BigQuery
Connections API / Federated query [6, 7]. This largely provides an
alternative for using Beam's ZetaSQL interacting with BigQuery.
For these reasons, I propose initiating the process of deprecating
Beam SQL's ZetaSQL component. There are two decisions needed to be made:
Firstly, agree on when to document the deprecated status for ZetaSQL
component in javadoc, beam website, currently I recommend do it in the
release that currently HEAD belongs, that is Beam 2.65.0 (cut April 30,
2025)
Secondly, stop publishing ZetaSQL artifacts. This is a breaking change, and
I think we can leave the deprecated status as is until the following
situation emerges, whichever comes first, and no earlier than Beam 2.66.0
(cut Jun 11, 2025)
- Continued support for ZetaSQL component involving significant burdens,
like conflict with other Beam dependencies, supported Java versions, etc, or - When Beam moved to the next release major release (3)
Secondly, stop publishing ZetaSQL artifacts. This is a breaking change, and
I think we can leave the deprecated status as is until the following
situation emerges, whichever comes first, and no earlier than Beam 2.66.0
(cut Jun 11, 2025)
- Continued support for ZetaSQL component involving significant burdens,
like conflict with other Beam dependencies, supported Java versions, etc, or - When Beam moved to the next release major release (3)
[1]
https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/zetasql
[2] https://beam.apache.org/documentation/dsls/sql/overview/
[3]
https://github.com/benEng/beam/commits/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java
[4]
https://github.com/apache/beam/issues?q=is%3Aissue%20%20label%3Azetasql%20
[5] #32902
[6] https://cloud.google.com/bigquery/docs/connections-api-intro
[7] https://cloud.google.com/bigquery/docs/federated-queries-intro
Issue Priority
Priority: 2 (default / most normal work should be filed as P2)
Issue Components
- Component: Python SDK
- Component: Java SDK
- Component: Go SDK
- Component: Typescript SDK
- Component: IO connector
- Component: Beam YAML
- Component: Beam examples
- Component: Beam playground
- Component: Beam katas
- Component: Website
- Component: Infrastructure
- Component: Spark Runner
- Component: Flink Runner
- Component: Samza Runner
- Component: Twister2 Runner
- Component: Hazelcast Jet Runner
- Component: Google Cloud Dataflow Runner