Skip to content

Conversation

@dain
Copy link
Member

@dain dain commented Mar 9, 2025

Description

This PR proposes adding PostgreSQL style double colon :: casts to Trino. I find the standard SQL cast CAST(something as BIGINT) is quite wordy and difficult to write compared to :: especially when it is necessary to drop casts into existing queries (due the the strict nature of SQL type system). I find being able to drop a simple ::type on a symbol reference or function call to be super easy in PostgreSQL.

Fixes #23795

Release notes

(X) Release notes are required, with the following suggested text:

## Section
* Add support for PostgreSQL style `::` casts. ({issue}`25259`)

@dain dain requested a review from martint March 9, 2025 02:52
@cla-bot cla-bot bot added the cla-signed label Mar 9, 2025
@martint
Copy link
Member

martint commented Mar 9, 2025

See #23795 (comment)

@dain
Copy link
Member Author

dain commented Mar 9, 2025

See #23795 (comment)

@martint that sounds great. It means that we have no concern that something useful will be used for this. I don't see us ever implementing that part of the spec, so we can use this operator for something awesome.

@martint
Copy link
Member

martint commented Mar 12, 2025

I don't see us ever implementing that part of the spec, so we can use this operator for something awesome.

On the contrary. It's something I can envision supporting. E.g., you could do integer::to_hex(...), which, among other things, could help to avoid some tricky function overloading scenarios.

There are a few options for handling ambiguities:

  • Fail the query when the a both a column::type and type::method are valid candidates for a given invocation
  • Use a different operator instead of ::
  • Use a slightly different syntax (e.g., instead of 123::varchar, use 123.as(varchar)), and treat it like method invocations on values -- similar to https://www.markhneedham.com/blog/2024/08/25/duckdb-chaining-functions/)

@wendigo
Copy link
Contributor

wendigo commented Mar 12, 2025

What about syntax 11 AS BIGINT or 11 AS TYPE BIGINT ?

@electrum
Copy link
Member

The ambiguity issue seems unlikely in practice, as you wouldn't declare a function with the same name as a type, except in the case where the function is an alias for a cast. For example, we support date(x) as a cast to date, so x::date would be the same, no matter which one is invoked.

The PostgreSQL cast syntax is well-understood by users and is quite convenient, so I vote to support it.

@electrum
Copy link
Member

electrum commented Mar 12, 2025

We can resolve the ambiguity by allowing it when the function is also a cast (as in the case of date), and failing the query otherwise. It looks like date is the only such function that exists today.

FunctionMetadata can have a new cast flag to indicate that the function is an alias for a cast, and ScalarFromAnnotationsParser can set the flag when the function is also a cast operator, probably in ScalarHeader.fromAnnotatedElement().

@ebyhr
Copy link
Member

ebyhr commented Mar 14, 2025

The current behavior with literals is ambiguous to me. For instance, I expected the following statement returns timestamp type at a glance, but it returns varchar type.

SELECT TIMESTAMP '2007-08-09 9:10:11 Europe/Berlin'::VARCHAR;

@dain
Copy link
Member Author

dain commented Mar 18, 2025

The current behavior with literals is ambiguous to me. For instance, I expected the following statement returns timestamp type at a glance, but it returns varchar type.

SELECT TIMESTAMP '2007-08-09 9:10:11 Europe/Berlin'::VARCHAR;

I don't think that is ambigous. The first part of the expression is TIMESTAMP '2007-08-09 9:10:11 Europe/Berlin' is a type constructor in the grammar:

    | identifier string                                                                   #typeConstructor

Type constructors do not support an expression after the type name, and only support a literal string. Then the second part casts the timestamp to a varchar.

@electrum
Copy link
Member

electrum commented Mar 18, 2025

It is the same as if you wrote it with parentheses:

SELECT (TIMESTAMP '2007-08-09 9:10:11 Europe/Berlin')::VARCHAR;

The syntax for type constructors is special because they only support literals, not expressions. But I agree in this instance that is appears confusing, as the cast operator goes left-to-right instead of the typical inside-out for function calls. Visually, it's also confusing because the type constructor has a space inside of it, but there's no separation between the string literal and the cast operator.

It's good to call out this edge case in the SQL syntax, but I don't have a good solution, other than mentioning it in the documentation. PostgreSQL should have the same issue.

@dain
Copy link
Member Author

dain commented Mar 18, 2025

I expect people won't mix the syntax, and instead would do this:

'2007-08-09 9:10:11 Europe/Berlin'::TIMESTAMP

So the original expression would be:

'2007-08-09 9:10:11 Europe/Berlin'::TIMESTAMP::VARCHAR

@github-actions
Copy link

github-actions bot commented Apr 9, 2025

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label Apr 9, 2025
@github-actions
Copy link

github-actions bot commented May 1, 2025

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this May 1, 2025
@martint martint reopened this May 1, 2025
@github-actions github-actions bot removed the stale label May 2, 2025
@github-actions
Copy link

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label May 26, 2025
@kyleromines
Copy link

@dain @martint New user to trino here, I think this is important to have so wanted to bump this up.

@github-actions github-actions bot removed the stale label Jun 2, 2025
LTE: '<=';
GT: '>';
GTE: '>=';
DOUBLE_COLON: '::';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This token declaration isn't needed, as we don't compare the token value in the generated code.

@wendigo
Copy link
Contributor

wendigo commented Jun 13, 2025

@martint do we have a way forward for this change? I agree with what David said about ambiguity and casts. This syntax makes a lot of sense from usability perspective

@martint
Copy link
Member

martint commented Jun 13, 2025

I'm generally for supporting this, but I think there are some cases we're overlooking in the analysis.

First, here's a concrete example of where the semantic ambiguity arises. As long as we acknowledge that at some point there will be an ambiguity and we'll have to fail those queries, I'm ok.

SELECT integer::double FROM (VALUES 1) t(integer)

The integer::double invocation can be interpreted as both a cast of the column named integer to double, or an invocation of the double() method on type integer.

The other cases we haven considered are those involving more complex types. In particular, how the syntax works for composite types such as array, map, row whose syntax is incompatible with expression syntax, so they can't be parsed using the same rules and can't use the same AST data structures.

For example:

  • x::varchar(3): can be parsed both as a cast and a static function invocation, so they can share syntactic structure
  • x::array(bigint): can be parsed as both a cast and a static function invocation (i.e., array could be a function name and bigint a column name), so they can also share syntactic structure
  • x::row(a bigint, b varchar): can only be parsed as a cast, since row(a bigint, b varchar) is not a valid expression, so it needs separate syntax
  • x::array(array(array(array(array(array(row(a bigint, b varchar))))))): can only be parsed as a cast due to the innermost row(a bigint, b varchar). However, this cannot be known until the parser recurses all the way to the bottom and could require backtracking (with the negative performance implications). Also, it may not be possible to parse this structure without creating a parallel hybrid expression/type set of rules, or making the type rule a valid expression at the syntax level.

@electrum
Copy link
Member

electrum commented Jun 13, 2025

Do any of these ambiguities exist if we disallow functions with the same names as types, unless they are equivalent to casts?

The integer::double invocation can be interpreted as ... an invocation of the double() method on type integer.

What would be usefulness of having a no-arg double() method on type integer?

x::varchar(3)

This would be a type named x that has a varchar() method that takes an integer. I can't see how this could be a problem in practice, especially given the proposed rule above.

I understand that this could complicate parsing and analysis, but I can't imagine a scenario where it would be semantically ambiguous or otherwise cause a problem.

@github-actions
Copy link

github-actions bot commented Jul 7, 2025

This pull request has gone a while without any activity. Ask for help on #core-dev on Trino slack.

@github-actions github-actions bot added the stale label Jul 7, 2025
@github-actions
Copy link

Closing this pull request, as it has been stale for six weeks. Feel free to re-open at any time.

@github-actions github-actions bot closed this Jul 28, 2025
@dain dain reopened this Jul 28, 2025
@dain dain added the stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. label Jul 28, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed stale stale-ignore Use this label on PRs that should be ignored by the stale bot so they are not flagged or closed. syntax-needs-review

Development

Successfully merging this pull request may close these issues.

Cast operator ::

6 participants