[FLINK-37288] Add Google Cloud Spanner dialect and catalog by laughingman7743 · Pull Request #156 · apache/flink-connector-jdbc

laughingman7743 · 2025-02-09T06:44:45Z

What is the purpose of the change

Adds Google Cloud Spanner support to flink-connector-jdbc as a new flink-connector-jdbc-spanner module.

Dialect: Type mapping between Spanner and Flink types, including array type write support
Catalog: Database/table discovery via Spanner Admin API and INFORMATION_SCHEMA, with semicolon-delimited URL parameter handling (;param=value)
Lineage: Extracts Spanner-specific location info (project, instance, database) from JDBC URLs
Docs: Added Spanner section to JDBC connector docs (English and Chinese)

Changes to core module

AbstractJdbcCatalog: Added protected constructor and hooks (calculateUrlFunction, validateConnectionProperties) for dialects with non-standard URL formats
FieldNamedPreparedStatement: Array type support

Testing

All features are tested against the Spanner emulator via Testcontainers: dialect type mapping, catalog operations, end-to-end read/write, and factory SPI registration.

Jira

https://issues.apache.org/jira/browse/FLINK-37288

Discussion

https://lists.apache.org/thread/4344z501pszv5h7gmr0311fw7h6h4sw4

boring-cyborg · 2025-02-09T06:44:48Z

Thanks for opening this pull request! Please check out our contributing guidelines. (https://flink.apache.org/contributing/how-to-contribute.html)

laughingman7743 · 2025-02-09T06:48:32Z

-        this.connectionProperties = Preconditions.checkNotNull(connectionProperties);
-        checkArgument(
-                !StringUtils.isNullOrWhitespaceOnly(connectionProperties.getProperty(USER_KEY)));
-        checkArgument(
-                !StringUtils.isNullOrWhitespaceOnly(
-                        connectionProperties.getProperty(PASSWORD_KEY)));


Spanner does not use password authentication.

laughingman7743 · 2025-02-09T06:48:53Z

-    /**
-     * URL has to be without database, like "jdbc:dialect://localhost:1234/" or
-     * "jdbc:dialect://localhost:1234" rather than "jdbc:dialect://localhost:1234/db".
-     */
-    protected static void validateJdbcUrl(String url) {
-        String[] parts = url.trim().split("\\/+");
-
-        checkArgument(parts.length == 2);
-    }
-


In the case of Spanner, the URL is as follows, so I have deleted this validation.

jdbc:cloudspanner://hostname/projects/gcp_project_id/instances/instance_id/databases/database_id

laughingman7743 · 2025-02-09T06:49:25Z


-/** Test for {@link AbstractJdbcCatalog}. */
-class AbstractJdbcCatalogTest {


This class is no longer needed because URL validation has been removed.

laughingman7743 · 2025-02-09T06:53:15Z

    }

-    public Schema getTableSchema() {
+    public Schema getTableSchema(String pkConstraintName) {


In the case of Spanner, the contract name of the primary key is different, so it is possible to specify the contract name of the primary key.

laughingman7743 · 2025-02-09T15:32:36Z

I have applied code formatting using Spotless and updated the documentation. It is ready for review.

davidradl · 2025-03-10T09:57:19Z

+| Db2        | `com.ibm.db2.jcc`          | `db2jcc`                    | [Download](https://www.ibm.com/support/pages/download-db2-fix-packs-version-db2-linux-unix-and-windows)                                 | 
+| Trino      | `io.trino`                 | `trino-jdbc`                | [Download](https://repo1.maven.org/maven2/io/trino/trino-jdbc/)                                                                         |
+| OceanBase  | `com.oceanbase`            | `oceanbase-client`          | [Download](https://repo1.maven.org/maven2/com/oceanbase/oceanbase-client/)                                                              |
+| Spanner    | `com.google.cloud`         | `google-cloud-spanner-jdbc` | [Download](https://central.sonatype.com/artifact/com.google.cloud/google-cloud-spanner-jdbc)                                            |


I see the download url points to Maven like the spanner doc. How do I get the jar file?

I have written the same URL as the official document.

https://cloud.google.com/spanner/docs/jdbc-drivers

The Jar files can be downloaded from the following.

https://repo1.maven.org/maven2/com/google/cloud/google-cloud-spanner-jdbc/2.27.1/

https://repo1.maven.org/maven2/com/google/cloud/google-cloud-spanner-jdbc/2.27.1/google-cloud-spanner-jdbc-2.27.1-single-jar-with-dependencies.jar

https://repo1.maven.org/maven2/com/google/cloud/google-cloud-spanner-jdbc/2.27.1/google-cloud-spanner-jdbc-2.27.1.jar

https://repo1.maven.org/maven2/com/google/cloud/google-cloud-spanner-jdbc/2.27.1/google-cloud-spanner-jdbc-2.27.1-single-jar-with-dependencies.jar
It is necessary to use a single JAR file that includes dependencies.

davidradl · 2025-03-10T10:01:13Z

 Data Type Mapping
 ----------------
-Flink supports connect to several databases which uses dialect like MySQL, Oracle, PostgreSQL, CrateDB, Derby, SQL Server, Db2 and OceanBase. The Derby dialect usually used for testing purpose. The field data type mappings from relational databases data types to Flink SQL data types are listed in the following table, the mapping table can help define JDBC table in Flink easily.
+Flink supports connect to several databases which uses dialect like MySQL, Oracle, PostgreSQL, CrateDB, Derby, SQL Server, Db2, OceanBase and Spanner. The Derby dialect usually used for testing purpose. The field data type mappings from relational databases data types to Flink SQL data types are listed in the following table, the mapping table can help define JDBC table in Flink easily.


nits:
connect -> the connection
which uses dialect like -> using dialects e.g.
The Derby dialect usually used for testing purpose. -> The Derby dialect is usually used for testing purpose.
JDBC table -> a JDBC table

Fixed: 3c7d225

laughingman7743 · 2025-05-18T06:04:51Z

Does this repository not have an owner? Many PRs adding dialects are being left unreviewed. Does this mean that contributions are not accepted?

laughingman7743 · 2025-11-14T13:52:23Z

+        // Check if base-url contains query parameters
+        int questionMarkIndex = baseUrl.indexOf('?');
+
+        if (questionMarkIndex == -1) {
+            // No parameters: traditional baseUrl + databaseName
+            return baseUrl + databaseName;
+        }
+
+        // Parameters present: insert database name before '?'
+        // Example: "jdbc:postgresql://localhost:5432/?sslmode=require"
+        //       -> "jdbc:postgresql://localhost:5432/mydb?sslmode=require"
+        String urlWithoutParams = baseUrl.substring(0, questionMarkIndex);
+        String params = baseUrl.substring(questionMarkIndex);
+        return urlWithoutParams + databaseName + params;


For Spanner tests, it is necessary to configure autoConfigEmulator to automatically set up the emulator. This enables parameter passing.

laughingman7743 · 2025-11-15T01:44:17Z

+    @Override
+    public void setArray(int fieldIndex, Array x) throws SQLException {
+        for (int index : indexMapping[fieldIndex]) {
+            statement.setArray(index, x);
+        }
+    }
+
+    @Override
+    public Array createArrayOf(String typeName, Object[] elements) throws SQLException {
+        return connection.createArrayOf(typeName, elements);
+    }


These implementations are necessary to support writing to Array types.

This implementation introduces the first full ARRAY type write support in the
flink-connector-jdbc project. Currently, PostgreSQL dialect only supports reading
ARRAY types (throws IllegalStateException on write - see
PostgresDialectConverter.java:61-67).

If this change is merged, the implementation pattern here could serve as a reference
for adding ARRAY write support to PostgreSQL and other dialects in the future.

laughingman7743 · 2026-02-03T02:48:55Z

@davidradl Could you take another look when you have time? I've addressed the previous feedback and added some fixes for the test issues.

github-actions · 2026-06-11T08:04:00Z

This PR is being marked as stale since it has not had any activity in the last 90 days.
If you would like to keep this PR alive, please leave a comment asking for a review.
If the PR has merge conflicts, update it with the latest from the base branch.

If you are having difficulty finding a reviewer, please reach out to the
community, contact details can be found here: https://flink.apache.org/what-is-flink/community/

If this PR is no longer valid or desired, please feel free to close it.
If no activity occurs in the next 30 days, it will be automatically closed.

laughingman7743 · 2026-06-11T08:14:19Z

Rebased onto the latest main to resolve the stale status. This PR is still active and awaiting review. Any feedback would be greatly appreciated.

RocMarshal · 2026-06-22T16:14:33Z

Hi, @laughingman7743 Could you help Run 'mvn spotless:apply' before the next review?
Thanks

laughingman7743 · 2026-06-23T00:21:26Z

Hi @RocMarshal, thanks for taking a look!

I've run mvn spotless:apply and pushed the fix (the violation was a comment line in SpannerCatalog.java exceeding the line-length limit). I also rebased the branch onto the latest main. spotless:check now passes for all modules locally.

It looks like the CI needs maintainer approval to run — could you kindly approve the workflow when you have a chance? Thanks again for the review! 🙏

laughingman7743 · 2026-06-23T03:11:15Z

CI failure is a pre-existing flaky test on `main`, not caused by this PR

The red job (https://github.com/apache/flink-connector-jdbc/actions/runs/27995114250/job/82864660173) fails in DerbyDynamicTableSourceITCase.testLimit:

Caused by: java.sql.SQLNonTransientConnectionException: No current connection.  (ERROR 08003)
  at ...EmbedPreparedStatement.executeQuery(...)

This is not introduced by this PR:

This branch is rebased on af994651. CI for that exact commit on main (https://github.com/apache/flink-connector-jdbc/actions/runs/27738188350) fails in the same DerbyDynamicTableSourceITCase class with the same ERROR 08003: No current connection, only a different method (testProject).
The latest main (https://github.com/apache/flink-connector-jdbc/actions/runs/27966883251) also fails CI on another flaky core ITCase (JdbcSourceStreamRelatedITCase.testAtLeastOnceWithFailure, 30s TimeoutException).
None of the failing test files are touched by this PR; changes are isolated to the new flink-connector-jdbc-spanner module plus catalog/dialect/lineage in core.

Root cause of the flaky `testLimit`

testLimit runs SELECT * FROM t LIMIT 1 over a source split into 2 partitions (scan.partition.num=2). With LIMIT 1 the job completes after the first row and cancels the source while the split-fetcher thread is still opening the second split. Cancellation tears the JDBC connection down, so the in-flight JdbcSourceSplitReader.openResultSetForSplitWhenAtLeastOnce() -> statement.executeQuery() runs against an already-closed connection -> 08003: No current connection.

Two gaps in JdbcSourceSplitReader make this fatal and racy:

wakeUp() is a no-op (https://github.com/apache/flink-connector-jdbc/blob/main/flink-connector-jdbc-core/src/main/java/org/apache/flink/connector/jdbc/core/datastream/source/reader/JdbcSourceSplitReader.java), so there is no cooperative cancellation — the reader keeps opening the next split during shutdown.
fetch() rethrows any SQLException as a fatal RuntimeException, so a benign shutdown-time "connection closed" error fails the whole job.

How the fix differs from the closed PR #191

That PR guarded only resultSet.isClosed() before extract() / resultSet.next() inside the record loop. But per the stack trace the failure is at executeQuery() when opening the next split, which that PR does not touch — so it would not reliably fix this. It was closed without merge.

The fix I'd propose is different and targets the actual cause — cooperative cancellation:

wakeUp() / close() set a volatile boolean wakeup flag.
checkSplitOrStartNext() checks the flag before opening a new split (the executeQuery() path) and returns "no more records" instead of starting a query during shutdown.
In fetch(), if a SQLException occurs while the reader is shutting down (flag set / thread interrupted), treat it as a graceful end-of-split instead of rethrowing — this closes the residual race window where cancellation lands during the Derby call.

This covers the executeQuery() failure point that #191 missed, and removes the "shutdown error -> job failure" behavior. I'm happy to open a separate, focused PR + JIRA for this so both this PR and main go green.

laughingman7743 · 2026-06-23T12:06:40Z

Follow-up on the root cause above: I filed FLINK-39975 and opened #200 to fix the flaky DerbyDynamicTableSourceITCase.testLimit.

The fix makes JdbcSourceSplitReader recover from a connection that is closed while opening a split during source cancellation: it re-establishes the connection and retries the split-open (bounded, mirroring the retry/reconnect handling in JdbcOutputFormat.flush()). It is non-masking — a genuine query error on a healthy connection is rethrown immediately, and an unreachable database still fails the job after the retry budget. A deterministic regression test reproduces the exact 08003 "No current connection" error (the test fails without the fix and passes with it).

Once #200 lands on main, this flaky test should stop reddening unrelated PRs (including this one).

RocMarshal · 2026-06-23T16:43:27Z

Hi, @davidradl @eskabetxe
Could you help take a look if you had the free time ? thx

boring-cyborg Bot added the component=JDBC/Core label Feb 9, 2025

laughingman7743 marked this pull request as ready for review February 9, 2025 06:45

laughingman7743 changed the title ~~[FLINK-37288] Add flink-connector-jdbc-spnner~~ [FLINK-37288] Add Google Cloud Spanner dialect and catalog Feb 9, 2025

laughingman7743 commented Feb 9, 2025

View reviewed changes

laughingman7743 mentioned this pull request Feb 9, 2025

Add flink-connector-jdbc-spnner laughingman7743/flink-connector-jdbc#1

Closed

laughingman7743 commented Feb 9, 2025

View reviewed changes

laughingman7743 force-pushed the jdbc_spanner_connector branch 8 times, most recently from c7f0a5a to 3a73e0b Compare February 9, 2025 15:17

boring-cyborg Bot added the component=Documentation label Feb 9, 2025

laughingman7743 force-pushed the jdbc_spanner_connector branch from 3a73e0b to be60494 Compare February 9, 2025 15:27

laughingman7743 force-pushed the jdbc_spanner_connector branch 2 times, most recently from e4559c9 to 4ec422d Compare February 12, 2025 06:19

laughingman7743 force-pushed the jdbc_spanner_connector branch from 4ec422d to 520db3b Compare February 18, 2025 07:22

boring-cyborg Bot added the component=JDBC/Shaded label Feb 18, 2025

davidradl reviewed Mar 10, 2025

View reviewed changes

laughingman7743 force-pushed the jdbc_spanner_connector branch from 84e20b0 to 939af62 Compare November 14, 2025 13:01

laughingman7743 commented Nov 14, 2025

View reviewed changes

laughingman7743 commented Nov 15, 2025

View reviewed changes

laughingman7743 force-pushed the jdbc_spanner_connector branch from ecba519 to 854b846 Compare February 3, 2026 02:31

laughingman7743 force-pushed the jdbc_spanner_connector branch from 5e0d135 to a87161a Compare February 3, 2026 06:44

laughingman7743 force-pushed the jdbc_spanner_connector branch from a87161a to 0599b0c Compare February 14, 2026 03:53

laughingman7743 force-pushed the jdbc_spanner_connector branch 3 times, most recently from fb7917c to 053359f Compare March 12, 2026 16:09

github-actions Bot added the stale label Jun 11, 2026

laughingman7743 force-pushed the jdbc_spanner_connector branch from 053359f to df497eb Compare June 11, 2026 08:12

github-actions Bot removed the stale label Jun 12, 2026

laughingman7743 force-pushed the jdbc_spanner_connector branch from df497eb to 8851d69 Compare June 18, 2026 05:15

RocMarshal self-assigned this Jun 22, 2026

laughingman7743 force-pushed the jdbc_spanner_connector branch from 8851d69 to cac553f Compare June 23, 2026 00:19

laughingman7743 force-pushed the jdbc_spanner_connector branch from cac553f to c0e976d Compare June 23, 2026 01:09

laughingman7743 added 8 commits June 24, 2026 00:23

[FLINK-37288] Add flink-connector-jdbc-spnner

d0e8f75

[FLINK-37288] Update docs

b4c85c9

[FLINK-37288] Support query parameters in JDBC Catalog base-url

41ee562

[FLINK-37288] Add array type write support

dccc5c1

[FLINK-34467] Add lineage support for Spanner

8007713

[FLINK-37288] Handle Spanner semicolon params in getDatabaseUrl

5f65c13

[FLINK-37288] Fix SpannerCatalog compatibility with FLINK-38851 changes

d97dbbc

[FLINK-37288] Apply spotless

b5f1c30

laughingman7743 force-pushed the jdbc_spanner_connector branch from c0e976d to b5f1c30 Compare June 23, 2026 15:25


		/** Test for {@link AbstractJdbcCatalog}. */
		class AbstractJdbcCatalogTest {

Conversation

laughingman7743 commented Feb 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What is the purpose of the change

Changes to core module

Testing

Jira

Discussion

Uh oh!

boring-cyborg Bot commented Feb 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laughingman7743 commented Feb 9, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laughingman7743 commented May 18, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

laughingman7743 commented Feb 3, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026

Uh oh!

laughingman7743 commented Jun 11, 2026

Uh oh!

RocMarshal commented Jun 22, 2026

Uh oh!

laughingman7743 commented Jun 23, 2026

Uh oh!

laughingman7743 commented Jun 23, 2026

CI failure is a pre-existing flaky test on main, not caused by this PR

Root cause of the flaky testLimit

How the fix differs from the closed PR #191

Uh oh!

laughingman7743 commented Jun 23, 2026

Uh oh!

RocMarshal commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

laughingman7743 commented Feb 9, 2025 •

edited

Loading

CI failure is a pre-existing flaky test on `main`, not caused by this PR

Root cause of the flaky `testLimit`