Skip to content

Add VARIANT_JSON and VARIANT_BINARY client capabilities#29046

Open
dain wants to merge 2 commits intomasterfrom
user/dain/variant-client
Open

Add VARIANT_JSON and VARIANT_BINARY client capabilities#29046
dain wants to merge 2 commits intomasterfrom
user/dain/variant-client

Conversation

@dain
Copy link
Copy Markdown
Member

@dain dain commented Apr 8, 2026

Description

Add support for VARIANT in clients. There are three supported scenarios, based on the settings of client capabilities:

  1. nothing set: return type JSON with data encoded as JSON
  2. VARIANT_JSON: return type VARIANT with data encoded as JSON
  3. VARIANT_BINARY: return type VARIANT with data in standard binary variant format.

The cli uses VARIANT_JSON, and the JDBC driver uses VARIANT_BINARY.

The JDBC client uses binary VARIANT directly without lowering to JSON and losing all type information. VARIANT columns can be fetched with:

  • rs.getObject(column) or rs.getObject(column, Object.class) to get the corresponding Java object representation. This is the same as variant.toObject
  • rs.getObject(column, io.trino.jdbc.Variant.class) to get an io.trino.jdbc.Variant
  • rs.getObject(column, Map.class) Same as getObject(Column) but only works when the result would be a Map. Throws an exception if the variant is not an object.
  • rs.getObject(column, List.class) Same as getObject(Column) but only works when the result would be a Array. Throws an exception if the variant is not an array.
  • rs.getString(column) or rs.getObject(column, String.class) to get the corresponding JSON representation. This is the same as variant.toJson, which is the same as the VARIANT_JSON encoding.

When mapping to Java object representation, VARIANT values map to Java objects as follows:

VARIANT value Java object
null null
boolean Boolean
int8 Byte
int16 Short
int32 Integer
int64 Long
float Float
double Double
decimal BigDecimal
string String
binary byte[]
date LocalDate
time(ntz) LocalTime
timestamp(utc) Instant
timestamp(ntz) LocalDateTime
uuid UUID
array List<Object>
object Map<String, Object>

The io.trino.jdbc.Variant class is a minimal implementation of VARIANT for decoding data. This supports getting the raw underlying byte arrays for the value and metdata is the client wants to interact with the raw data (storing in a separate system directly without reencoding). The SPI Variant class cannot be used because it is compiled with Java 25.

Release notes

(x) Release notes are required, with the following suggested text:

## CLI, JDBC
* Add support for VARIANT type. ({issue}`29046`)

@cla-bot cla-bot Bot added the cla-signed label Apr 8, 2026
Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM except TestJsonEncodingUtils which i didn't review. Seek review from @wendigo .

Add a test in TestJdbcResultSet too.

Comment thread core/trino-main/src/main/java/io/trino/server/protocol/JsonEncodingUtils.java Outdated
@findepi findepi requested review from findinpath and wendigo April 8, 2026 19:44
@dain dain force-pushed the user/dain/variant-client branch from c1da882 to aba2fc7 Compare April 9, 2026 01:35
@github-actions github-actions Bot added the jdbc Relates to Trino JDBC driver label Apr 9, 2026
@dain dain changed the title Add VARIANT_JSON client capability with fallback to JSON Add VARIANT_JSON and VARIANT_BINARY client capabilities Apr 9, 2026
@dain dain requested a review from findepi April 9, 2026 01:35
@dain dain force-pushed the user/dain/variant-client branch from aba2fc7 to 1ccf36e Compare April 9, 2026 02:31
@ebyhr
Copy link
Copy Markdown
Member

ebyhr commented Apr 9, 2026

Please don't remove ({issue}`issuenumber`) placeholder from PR description. You should put the PR number when the relevant issue doesn't exist.

@dain
Copy link
Copy Markdown
Member Author

dain commented Apr 9, 2026

Please don't remove ({issue}`issuenumber`) placeholder from PR description. You should put the PR number when the relevant issue doesn't exist.

There is no relevant issue. Someone modified the message to say it fixes a bug, but there is no bug since variant is unreleased.

@dain dain force-pushed the user/dain/variant-client branch from 1ccf36e to 308d8d8 Compare April 9, 2026 04:45
@findinpath
Copy link
Copy Markdown
Contributor

Tested with trino cli 468


trino> select CAST(JSON '{"a":1,"b":true}' AS VARIANT);
      _col0       
------------------
 {"a":1,"b":true} 
(1 row)

Query 20260409_160644_00026_n6pci, FINISHED, 1 node
http://localhost:8080/ui/query.html?20260409_160644_00026_n6pci
Splits: 1 total, 1 done (100.00%)
CPU Time: 0.0s total,     0 rows/s,     0B/s, 0% active
Per Node: 0.0 parallelism,     0 rows/s,     0B/s
Parallelism: 0.0
Peak Memory: 256B
0.12 [0 rows, 0B] [0 rows/s, 0B/s]

trino> select CAST(42 AS VARIANT);
 _col0 
-------
 42    
(1 row)

* Whether client supports the `VARIANT` type encoded as a binary payload on the wire.
* This capability is opt-in, so clients continue to receive the JSON representation by default.
*/
VARIANT_BINARY(false),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be mentioned in docs/src/main/sphinx/develop/client-protocol.md ?

I'm seeing in io.trino.jdbc.TrinoConnection#startQuery that it is being added without any checks.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see Client-Capabilities documented anywhere

Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/Variant.java
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/test/java/io/trino/jdbc/BaseTestJdbcResultSet.java Outdated
Comment thread client/trino-jdbc/src/test/java/io/trino/jdbc/BaseTestJdbcResultSet.java Outdated
Comment thread client/trino-client/src/test/java/io/trino/client/TestResultRowsDecoder.java Outdated
Comment thread client/trino-client/src/test/java/io/trino/client/TestResultRowsDecoder.java Outdated
Comment thread core/trino-main/src/test/java/io/trino/server/protocol/TestJsonEncodingUtils.java Outdated
Comment thread core/trino-main/src/test/java/io/trino/server/protocol/TestJsonEncodingUtils.java Outdated
@wendigo
Copy link
Copy Markdown
Contributor

wendigo commented Apr 13, 2026

Why do we need a second commit? Seems complex so what's the benefit

@dain dain force-pushed the user/dain/variant-client branch from 308d8d8 to af02fdb Compare April 13, 2026 20:35
@dain
Copy link
Copy Markdown
Member Author

dain commented Apr 13, 2026

Why do we need a second commit? Seems complex so what's the benefit

This allows to use the full power of VARIANT in JDBC. VARIANT caries type information that JSON does not, and lowering to JSON removes that. For example, you don't know which data contains timestamps

@dain dain force-pushed the user/dain/variant-client branch 4 times, most recently from 31e49b9 to 1a6af27 Compare April 13, 2026 21:40
Comment thread client/trino-client/src/main/java/io/trino/client/StatementClientV1.java Outdated
Comment thread client/trino-client/src/main/java/io/trino/client/ProtocolHeaders.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
Comment thread client/trino-jdbc/src/main/java/io/trino/jdbc/AbstractTrinoResultSet.java Outdated
@dain dain force-pushed the user/dain/variant-client branch from 1a6af27 to dd25f24 Compare April 15, 2026 01:11
@github-actions github-actions Bot added the docs label Apr 15, 2026
@dain
Copy link
Copy Markdown
Member Author

dain commented Apr 15, 2026

@findepi I rethought the JSON decision. With the redesign to lower VARIANT to JSON for old drivers, we have automatically define the JSON mapping for VARIANT, so that ship sailed. I added JSON support to the JDBC driver with getString(col) and getObject(col, String.class)

@findepi findepi dismissed their stale review April 16, 2026 12:54

outdated; didn't re-review

@dain dain requested review from electrum, findepi and wendigo April 16, 2026 19:08
Copy link
Copy Markdown
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

First commit LGTM. Naming comment only (important but small)

Second commit I didn't review throughly, because I don't agree with high level mechanics of the change.

This PR is a RELEASE-BLOCKER, and if I am reading this correctly, the first commit resolve the release blockade. We should merge the first commit, and move the controversial commit into separate PR.

* Whether client supports the `VARIANT` type encoded as JSON values on the wire.
* When this capability is not set, the server returns `json` for `VARIANT` columns.
*/
VARIANT_JSON,
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"varian [as] json" can be understood as "send variant values as json type" (what we do for legacy clients) or "send variant values as json objects" (what we do for new clients)

in terms of client capability names, just VARIANT would be a better name IMO

* Whether client supports the `VARIANT` type encoded as a binary payload on the wire.
* This capability is opt-in, so clients continue to receive the JSON representation by default.
*/
VARIANT_BINARY(false),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need another client capability?

i thought we need two states

  • old clients: no capability being sent, we choose backwards compatible behavior
  • new clients: capability being sent, we choose the best possible behavior

now we have four states

  • old clients: no capability being sent, we choose backwards compatible behavior
  • new clients, sending VARIANT_JSON or VARIANT_BINARY (or both)

Comment on lines +70 to +76
public boolean enabledByDefault()
{
return enabledByDefault;
}

public static Set<String> defaultClientCapabilities()
{
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires documentation what does it mean that client capability is enabled by default.
If this is a JDBC thing, the logic should live in trino-jdbc, not here.

Also, I think this is abusing client capabilities concept.
Client capabilities are meant to describe "what client is capable of [handling]".
Here they seem to used for runtime behavior switching.
It would be much better to keep their original meaning. It should be assumed that latest Trino clients have all the client capabilities defined in the protocol.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Development

Successfully merging this pull request may close these issues.

6 participants