[KYUUBI #7305] Fix JDBC engine returning tables from all databases when a database is specified in the URL by wangzhigang1999 · Pull Request #7419 · apache/kyuubi

wangzhigang1999 · 2026-04-24T09:53:59Z

Why are the changes needed?

This is a bug fix.

The Hive JDBC driver rewrites a null schemaPattern to "%" in GetTables, and the JDBC engine forwards it directly into WHERE TABLE_SCHEMA LIKE '%', so DBeaver — which always passes null once a connection is established — sees tables from every database in the backend rather than the one in its connection URL. Compounding this, the JDBC engine never applies the URL-level database to the underlying JDBC connection, so there is no "current database" on the backend for a filter to fall back to. The fix introduces a dialect-level setCurrentDatabase hook, tracks an effectiveDatabase on the session, and routes blank/"%" schema patterns through it in GetTablesOperation. MySQLDialect additionally implements getCatalogsOperation / getSchemasOperation with JDBC-standard column labels (TABLE_CATALOG / TABLE_SCHEM) so the schemas tree in DBeaver populates correctly.

Non-MySQL dialects inherit the default no-op setCurrentDatabase and effectiveDatabase stays None, so PostgreSQL / Oracle / etc. keep their existing behavior.

Note: getSchemas() continues to return all accessible schemas per the JDBC spec — the URL-level database only affects the default query context and the blank/"%" fallback in getTables. This matches the behavior of the native MySQL JDBC driver. Clients that want to restrict the visible schema list should do so via client-side filters (e.g. DBeaver's "Schemas" filter).

How was this patch tested?

Added unit / integration tests in MySQLOperationSuite covering:

getTables with a database in the URL returns only that database's tables
getTables with a "%" schema pattern is routed to the session's effective database
getSchemas returns results with JDBC-standard TABLE_CATALOG / TABLE_SCHEM column labels
getCatalogs behavior

Also verified end-to-end against a live StarRocks backend using Spark's Hive-JDBC beeline:

# URL with a database → only that database's tables
jdbc:hive2://127.0.0.1:10009/doctor              → !tables : 3 rows, all TABLE_SCHEMA=doctor
jdbc:hive2://127.0.0.1:10009/information_schema  → !tables : only information_schema views

# URL without a database → previous no-filter behavior preserved
jdbc:hive2://127.0.0.1:10009/                    → !tables : tables from every database

# DatabaseMetaData.getSchemas() returns JDBC-standard columns
columns: TABLE_CATALOG, TABLE_SCHEM
rows:    def/doctor, def/information_schema, def/sys, ...

Was this patch authored or co-authored using generative AI tooling?

Human-authored core design and implementation. Tests, verification, and PR description drafting were assisted by Claude Code.

Generated-by: Claude Code (Opus 4.7)

Closes #7305. Supersedes the stale #7307 per @pan3793's suggestion there.
Co-authored-by: zhzhenqin zhzhenqin@users.noreply.github.com

…ses when a database is specified in the URL When DBeaver (or any Hive-JDBC client) connects to the Kyuubi JDBC engine with a database path in the URL, DatabaseMetaData.getTables should return only the tables from that database. Previously, the Hive JDBC driver converted a null schemaPattern to "%" before sending the request, which the engine forwarded straight into a WHERE TABLE_SCHEMA LIKE '%' clause on the backend, returning tables from every database. - Introduce JdbcDialect.setCurrentDatabase(connection, database) as an extension point; the default is a no-op returning false. MySQLDialect overrides it with USE `db` so MySQL/StarRocks/Doris switch the backend connection on session open. - JdbcSessionImpl records the successfully applied database in effectiveDatabase and only keeps it when setCurrentDatabase returns true. "default" (the Hive JDBC driver's fallback when no database is specified in the URL) is tolerated on USE failure to stay compatible with backends that have no "default" database. - JdbcOperationManager.newGetTablesOperation treats a blank schema or "%" as "use the session's effective database", routing through effectiveDatabase.orNull. Backends without an effective database (PostgreSQL, Oracle, etc.) keep the previous no-filter behavior. - MySQLDialect now implements getCatalogsOperation and getSchemasOperation (previously featureNotSupported). getSchemasOperation aliases result columns to the JDBC-standard TABLE_CATALOG / TABLE_SCHEM so DatabaseMetaData.getSchemas works with clients such as DBeaver. Closes apache#7305.

pan3793 · 2026-04-27T02:57:42Z

+   * The return value is used by `JdbcOperationManager` to decide whether the requested
+   * database can be treated as the effective schema filter for metadata operations.
+   */
+  def setCurrentDatabase(connection: Connection, database: String): Boolean = false


does Connection.setSchema work?

Good question — I looked into this and it's a bit subtle. setSchema doesn't reliably work for the MySQL family in mysql-connector-j's default config. From ConnectionImpl.java (9.x):

// line 2625 public void setSchema(String schema) throws SQLException { checkClosed(); if (this.propertySet.<DatabaseTerm>getEnumProperty(PropertyKey.databaseTerm).getValue() == DatabaseTerm.SCHEMA) { setDatabase(schema); } } // line 2143 public void setCatalog(final String catalog) throws SQLException { if (this.propertySet.<DatabaseTerm>getEnumProperty(PropertyKey.databaseTerm).getValue() == DatabaseTerm.CATALOG) { setDatabase(catalog); } }

The driver gates both methods on the databaseTerm connection property:

databaseTerm setCatalog(x) setSchema(x)

CATALOG (default) sends COM_INIT_DB silent no-op

SCHEMA silent no-op sends COM_INIT_DB

So calling setSchema alone would silently do nothing under the default config (and reproduce the original bug). Note this also affects StarRocksDialect / DorisDialect since they extend MySQLDialect and use the same driver — databaseTerm is a driver-side property, not a server feature.

Two options I'm considering, happy to take direction:

Option A — keep USE db as-is in MySQLDialect (current PR). Simple, always works regardless of driver config, but it's hand-rolled SQL.

Option B — go through JDBC standard API, MySQL-family only:

Keep JdbcDialect.setCurrentDatabase as the default no-op returning false (PG / Oracle / etc. unchanged from this PR). Only MySQLDialect overrides it:

// MySQLDialect: call both — under mysql-connector-j, exactly one is a real op // for either databaseTerm value, the other is a guaranteed no-op. // Inherited by StarRocksDialect / DorisDialect. override def setCurrentDatabase(connection: Connection, database: String): Boolean = { connection.setCatalog(database) connection.setSchema(database) true }

This still avoids the hand-rolled USE SQL and gets the driver hooks (useLocalSessionState, ConnectionLifecycleInterceptor), but doesn't touch any non-MySQL dialect's behavior.

WDYT?

Update — went with Option B and pushed it as ff51841.

MySQLDialect.setCurrentDatabase now calls connection.setCatalog(database) followed by connection.setSchema(database) instead of executing USE ourselves. JdbcDialect's default stays a no-op returning false, and no other dialect is touched.

Why B over A:

Goes through JDBC standard API, addressing the spirit of your review — no more hand-rolled SQL, no backtick escaping concerns.

Engages the driver's internal hooks (useLocalSessionState cache, ConnectionLifecycleInterceptor) so connection state stays coherent.

Double-call cost is effectively zero: under any databaseTerm, the non-matching branch is if (false) return with no network IO.

Functionally equivalent for MySQL / StarRocks / Doris — verified locally with DBeaver: logs show catalog=null → catalog=doctor, confirming setCatalog actually sends COM_INIT_DB.

Why I didn't extend setSchema to the default impl for other dialects: Phoenix / Impala drivers are likely to throw SQLFeatureNotSupportedException from setSchema, which would regress session open. Keeping the change scoped to MySQL family.

Also added three StarRocks regression tests covering URL-scoped getTables, no-filter when URL has no db, and session-open failure on a non-existent db.

PTAL.

@wangzhigang1999 how about

replica the API with the JDBC standard API, with an additional leading conn: Connection parameter, default impl calls the JDBC standard API

override the impl for known JDBC drivers that do not support this API

don't need to care about the return value, always treat it as if it takes effect.

or investigate how DBeaver handles that

Looked into DBeaver source. The root cause isn't on the client side — both DBeaver and the Hive driver are spec-correct (null schemaPattern legitimately means "all schemas"). The real issue is that the engine was discarding the URL's database entirely: not switching the backend connection, not using it as the default metadata scope. Engine-side fix is the right place.

Will follow up with the API cleanup you suggested (default setSchema, override Phoenix/Impala if needed, drop the boolean).

…tandard setCatalog/setSchema Replace the hand-rolled `USE \`db\`` with `Connection.setCatalog` plus `Connection.setSchema` in MySQLDialect.setCurrentDatabase. mysql-connector-j gates the two methods on its `databaseTerm` connection property: with the default CATALOG, setCatalog issues COM_INIT_DB and setSchema is a silent no-op; with SCHEMA it is the inverse. Calling both means exactly one fires regardless of how the user configured the URL, while still going through standard JDBC API (and therefore the driver's session-state hooks instead of hand-rolled SQL). StarRocksDialect / DorisDialect inherit this via MySQLDialect. Also add three StarRocks regression tests covering: scoping when URL specifies a database, no-filter behavior when URL has no database, and session-open failure on a non-existent database.

…Dialect Per review feedback, replicate the JDBC `Connection#setSchema(String)` API on JdbcDialect with a leading `conn` parameter. The default forwards to the standard API; dialects whose driver does not support it can override to a no-op. Drop the boolean return value — callers treat the call as if it always takes effect. JdbcSessionImpl now records `effectiveDatabase` unconditionally after the call, decoupling "backend connection switched" from "engine metadata scope".

pan3793 · 2026-04-28T11:59:20Z

+    // Calling both means exactly one fires regardless of how the user configured the
+    // URL. StarRocksDialect / DorisDialect inherit this via MySQLDialect.
+    conn.setCatalog(schema)
+    conn.setSchema(schema)


Hmm... is this really safe? I think StarRocks and Doris support both catalog and schema ...

Good point to think through. The dual setCatalog + setSchema is safe because mysql-connector-j picks one of the two to actually send and silently swallows the other inside the driver, based on its databaseTerm connection property: with the default CATALOG, setCatalog sends a "switch database" command and setSchema does nothing; with SCHEMA it's the reverse. So no matter how the user configured the URL, the StarRocks/Doris server only ever receives one "switch database" command, never two.

Confirmed by reading the driver source — both methods funnel into the same setDatabase(...) call after the databaseTerm check, see ConnectionImpl.java#L2145. setCatalog is literally if (databaseTerm == CATALOG) setDatabase(catalog); and setSchema is the symmetric if (databaseTerm == SCHEMA) setDatabase(schema); — exactly one branch fires per call site, neither throws when its branch is skipped.

The server treats that command as "switch the current database" — it does not treat it as "switch catalog" (catalog switching on StarRocks/Doris is a separate SET CATALOG SQL statement, not something the MySQL protocol can do). The name we pass in comes from USE_DATABASE, which the Hive JDBC driver parses out of the URL path. The Hive URL syntax has no concept of catalog, so this name is always a database in the current (default) catalog.

pan3793 · 2026-04-28T12:13:49Z

+    } else {
+      schemaName
+    }
+    val query = dialect.getTablesQuery(catalogName, effectiveSchema, tableName, tableTypes)


I don't recall the details of the initial code of this part, but the API looks problematic.

the Hive Thrift GetTablesOperation API is derived from the JDBC API, and that's why they look almost identical, for JDBC engine, such a call should directly map to the JDBC API instead of requesting a SQL (if you take a look at the JDBC driver implementation, you may find some impls are non-SQL-based), the current design complicates the whole thing.

Thanks for raising this — you're right, and I want to confirm I took it seriously rather than hand-wave it away. To make sure I understood, I ran the full set of DatabaseMetaData calls against three different drivers (mysql-connector-j 8.4.0, postgresql 42.7.2, clickhouse-jdbc 0.6.5) using a small Java probe and docker containers. Test fixtures were tailored to each backend (MySQL/PG had PK + FK + a function; ClickHouse used MergeTree which has no native PK/FK).

Cross-driver behavior of each API:

API MySQL (databaseTerm=CATALOG) MySQL (databaseTerm=SCHEMA) PostgreSQL ClickHouse

getCatalogs() ✓ implemented ⚠ empty (driver databaseTerm quirk) ✓ implemented ✓ implemented

getSchemas() ⚠ empty (driver databaseTerm quirk) ✓ implemented ✓ implemented — model has no schema layer

getTableTypes() ✓ ✓ ✓ ✓

getTables(...) ✓ ✓ ✓ ✓

getColumns(...) ✓ ✓ ✓ ✓

getPrimaryKeys ✓ ✓ ✓ — MergeTree has no PK

getImported / Exported / CrossReference ✓ ✓ ✓ — no FK in CK

getFunctions(...) ✓ ✓ ✓ (no fixture created)

getTypeInfo() ✓ ✓ ✓ ✓

Things that hold across all three drivers:

No SQLFeatureNotSupportedException anywhere — every JDBC-standard metadata method is implemented and returns a ResultSet rather than throwing, even when the data model doesn't populate it.

Column names follow the JDBC spec (TABLE_CAT, TABLE_SCHEM, TABLE_NAME, COLUMN_NAME, DATA_TYPE, KEY_SEQ, PK_NAME, ...) — which is exactly what GetTablesOperation / GetColumnsOperation / etc. on the Hive Thrift side expect, since the Thrift API was derived from the JDBC API in the first place.

Driver-specific quirks are minor and well-defined — databaseTerm gating is unique to mysql-connector-j; PG returns column labels in lowercase while MySQL/CK use uppercase (JDBC spec says case-insensitive, but engine-side string comparisons would need to match).

So the JDBC standard API really does cover every operation the dialect models today, across very different DBMS families. Concretely, this means the JdbcDialect.getXxxQuery family of methods could be deleted in favor of the engine calling connection.getMetaData.getXxx(...) directly — and dialects whose driver implements metadata via a non-SQL path (Avatica, etc.) would automatically be used the way they were designed to.

The few wrinkles I hit are all driver-side and resolvable in a few lines of engine-level adapter code, not by reaching for SQL:

getCatalogs / getSchemas are mutually exclusive on mysql-connector-j (gated by databaseTerm) — handled by checking the property and routing, or by calling both and taking whichever is non-empty.

DATA_TYPE is int in the JDBC spec but the current SQL returns a string — java.sql.Types-to-name mapping is one helper.

The current SQL returns extra MySQL-specific columns (ENGINE, TABLE_ROWS, ...) but the Hive Thrift GetTables result schema is fixed to JDBC-standard columns, so those extras are already dropped during serialization today and aren't reaching clients — no real loss.

I'd like to keep this PR scoped to the URL-database bug since that's what it set out to fix, and migrating the dialect API to call DatabaseMetaData directly will touch every dialect, the operation manager, and every metadata test — not something I'd want to fold into a bug fix. Happy to open a follow-up issue tracking the migration if you agree with that split. WDYT?

migrating the dialect API to call DatabaseMetaData directly will touch every dialect, the operation manager, and every metadata test

if we do that first, do we still need this patch?

No SQLFeatureNotSupportedException anywhere

Actually, the JDBC API Javadocs define behavior in a very clear way, it should not throw SQLFeatureNotSupportedException unless Javadocs say it can. Some tools, like DBeaver, may tolerate it if the JDBC driver vendor does not respect the API contract, but tools like DataGrip may not.

BTW, we MUST avoid copying the code from mysql-connector-j, it's GPL-licensed ... try to read the docs or use black box testing to tackle the mysql-connector-j-related issue. (this will be a horrible experience ...)

github-actions Bot added the module:jdbc label Apr 24, 2026

wangzhigang1999 marked this pull request as ready for review April 24, 2026 10:54

pan3793 reviewed Apr 27, 2026

View reviewed changes

wangzhigang1999 added 3 commits April 28, 2026 12:52

Trigger CI

dd302b9

pan3793 reviewed Apr 28, 2026

View reviewed changes

`databaseTerm`	`setCatalog(x)`	`setSchema(x)`
`CATALOG` (default)	sends `COM_INIT_DB`	silent no-op
`SCHEMA`	silent no-op	sends `COM_INIT_DB`

API	MySQL (`databaseTerm=CATALOG`)	MySQL (`databaseTerm=SCHEMA`)	PostgreSQL	ClickHouse
`getCatalogs()`	✓ implemented	⚠ empty (driver `databaseTerm` quirk)	✓ implemented	✓ implemented
`getSchemas()`	⚠ empty (driver `databaseTerm` quirk)	✓ implemented	✓ implemented	— model has no schema layer
`getTableTypes()`	✓	✓	✓	✓
`getTables(...)`	✓	✓	✓	✓
`getColumns(...)`	✓	✓	✓	✓
`getPrimaryKeys`	✓	✓	✓	— MergeTree has no PK
`getImported / Exported / CrossReference`	✓	✓	✓	— no FK in CK
`getFunctions(...)`	✓	✓	✓	(no fixture created)
`getTypeInfo()`	✓	✓	✓	✓

Conversation

wangzhigang1999 commented Apr 24, 2026

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

wangzhigang1999 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

pan3793 Apr 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pan3793 Apr 28, 2026 •

edited

Loading

wangzhigang1999 Apr 28, 2026 •

edited

Loading

pan3793 Apr 28, 2026 •

edited

Loading