Skip to content

[Task]: Deprecate ZetaSQL #34423

Open
Open
@Abacn

Description

@Abacn

What needs to happen?

email thread link: https://lists.apache.org/thread/30ocd9sbdm3268jk1brvrp402xnfmlxp

Beam SQL currently serves with two SQL dialects (i) Apache Calcite and (ii)
ZetaSQL dialects, see documentation [2] due to the following reasons

  • Developments in Beam for ZetaSQL dialect effectively stalled since early
    2022 (See change history [3])

  • Despite incomplete support status, there is no new bug / feature request
    opened ever since we migrated to use GitHub Issue, suggesting minimal
    adoption [4]

  • We still need to keep zetasql up-to-date if its dependency conflicts with
    other google dependencies, as a result ZetaSQL component introduces
    maintenance burden when upgrading GCP-BOM (e.g. [5]).

  • One of the main reason that using ZetaSQL dialect, per [2], was because

pipelines that write to or read from BigQuery tables.

As of today, as GCP BigQuery now supports using GoogleSQL (open-sourced
as ZetaSQL) querying data that's stored outside of BigQuery via BigQuery
Connections API / Federated query [6, 7]. This largely provides an
alternative for using Beam's ZetaSQL interacting with BigQuery.

For these reasons, I propose initiating the process of deprecating
Beam SQL's ZetaSQL component. There are two decisions needed to be made:

Firstly, agree on when to document the deprecated status for ZetaSQL
component in javadoc, beam website, currently I recommend do it in the
release that currently HEAD belongs, that is Beam 2.65.0 (cut April 30,
2025)

Secondly, stop publishing ZetaSQL artifacts. This is a breaking change, and
I think we can leave the deprecated status as is until the following
situation emerges, whichever comes first, and no earlier than Beam 2.66.0
(cut Jun 11, 2025)

  • Continued support for ZetaSQL component involving significant burdens,
    like conflict with other Beam dependencies, supported Java versions, etc, or
  • When Beam moved to the next release major release (3)

Secondly, stop publishing ZetaSQL artifacts. This is a breaking change, and
I think we can leave the deprecated status as is until the following
situation emerges, whichever comes first, and no earlier than Beam 2.66.0
(cut Jun 11, 2025)

  • Continued support for ZetaSQL component involving significant burdens,
    like conflict with other Beam dependencies, supported Java versions, etc, or
  • When Beam moved to the next release major release (3)

[1]
https://github.com/apache/beam/tree/master/sdks/java/extensions/sql/zetasql
[2] https://beam.apache.org/documentation/dsls/sql/overview/
[3]
https://github.com/benEng/beam/commits/master/sdks/java/extensions/sql/zetasql/src/main/java/org/apache/beam/sdk/extensions/sql/zetasql/SupportedZetaSqlBuiltinFunctions.java
[4]
https://github.com/apache/beam/issues?q=is%3Aissue%20%20label%3Azetasql%20
[5] #32902
[6] https://cloud.google.com/bigquery/docs/connections-api-intro
[7] https://cloud.google.com/bigquery/docs/federated-queries-intro

Issue Priority

Priority: 2 (default / most normal work should be filed as P2)

Issue Components

  • Component: Python SDK
  • Component: Java SDK
  • Component: Go SDK
  • Component: Typescript SDK
  • Component: IO connector
  • Component: Beam YAML
  • Component: Beam examples
  • Component: Beam playground
  • Component: Beam katas
  • Component: Website
  • Component: Infrastructure
  • Component: Spark Runner
  • Component: Flink Runner
  • Component: Samza Runner
  • Component: Twister2 Runner
  • Component: Hazelcast Jet Runner
  • Component: Google Cloud Dataflow Runner

Metadata

Metadata

Assignees

No one assigned

    Type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions