-
Notifications
You must be signed in to change notification settings - Fork 671
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(tsql): SPLIT_PART function and conversion to PARSENAME in tsql #4211
base: main
Are you sure you want to change the base?
Conversation
Hey @daihuynh, thanks for the contribution! Have you checked other dialects as well? When introducing a new expression such as |
Hi @VaggelisD , I've just check the Snowflake doc about this function and don't see any of those are optional. I think Snowflake or other Spark-family engines just use that function directly. However, I'm happy to update the PR if any SQL engine has a different implementation. |
@daihuynh From what I remember there are other dialects that have |
All good @VaggelisD, I put unit tests in those dialects and update the expression's requirement if there's any variant. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took a better look at the PR and I'm not sure if I agree with it:
SPLIT_PART
is a function that is common across various engines and is used to choose a part after splitting any string by any delimiterPARSENAME
seems T-SQL specific and is used to split an object (e.g. a table name such asdb.schema.foo
), so the string must be (1) separated by dots and (2) have a max part count of 4.
So, representing PARSENAME
as an exp.SplitPart
node seems like a hacky solution and could work only as an extreme subset of it. Could we use T-SQL's STRING_SPLIT here?
Please let me know if I'm misunderstanding your approach; Also, feel free to share any context/examples regarding the need for this change.
Hi @VaggelisD, The context behind this PR is that my team's data modeller uses PARSENAME in many scripts when he works with source databases. His approach is ETL. With this PR: The reason I don't choose STRINGS_SPLIT in T-SQL is that it returns split values in rows and picking a value at index takes extra steps, which will change the structure of the original SQL query. Here is the example.
Hope this gives you more context. |
@VaggelisD Here is the reference table for SPLIT_PART function across supported databases in Sqlglot. The first 3 parameters (string, string, integer) are required in all cases. Apache Drill supports 4 parameters, but the final parameter is optional.
|
This PR introduces SPLIT_PART function used in Spark and Databricks as well as the corresponding conversion in TSQL.
In Spark/DataBricks
SELECT SPLIT_PART('1.2.3', '.', 1)
Becomes this in TSQL
SELECT PARSENAME('1.2.3', 3)
Or vice-verse