-
Notifications
You must be signed in to change notification settings - Fork 469
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add string_to_array function #32045
Add string_to_array function #32045
Conversation
All contributors have signed the CLA. |
I have read the Contributor License Agreement (CLA) and I hereby sign the CLA. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good, but can you address the OIDs and argument types before merging? Want to make sure we adhere to Postgres compatibility.
WOOHOO on your first PR!
src/sql/src/func.rs
Outdated
params!(String, Any) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3947; | ||
params!(String, Any, String) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3948; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We match the oids from Postgres, it looks like these should be 394 and 376 respectively (postgres).
Also, I think the second argument in both of these cases should be String
? I think even with the type of String
you should be able to pass NULL
as an argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done! ty!
src/expr/src/scalar/func.rs
Outdated
@@ -8102,6 +8175,7 @@ impl VariadicFunc { | |||
ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable) | |||
} | |||
RegexpReplace => ScalarType::String.nullable(in_nullable), | |||
StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the output is null if and only if the first argument is null. If this is true, you could give a tighter nullability here by copying the nullability of the first argument. (This would be similar to some of the above lines that use in_nullable
, but you can't use that directly, because that takes the nullability of all arguments, whereas you'd need only the first argument.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah neat, thanks! what difference does making the nullability more stringent like that make to the behaviour of the database?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... IS NULL
/IS NOT NULL
are rewritten tofalse
/true
. (Or not introduced in other cases, e.g., here.) Generally, we have a lot ofIS NOT NULL
checks, becauseMirRelationExpr::Join
matches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:- Certain subquery simplifications (
try_simplify_quantified_comparisons
) are possible only for non-nullable columns. NOT IN
subqueries choke on nullable inputs: https://github.com/MaterializeInc/database-issues/issues/382#issuecomment-2368827498- https://github.com/MaterializeInc/database-issues/issues/8396
- Certain subquery simplifications (
coalesce
call argument lists are truncated after the first non-nullable argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah thanks so much! that makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
There are some CI failures, but they are straightforward to resolve:
The "Cargo test" fail is just because of changed ids of some system objects. Needs
REWRITE=1 COCKROACH_URL=postgres://root@localhost:26257 cargo test test_http_sql
The "Fast SQL logic tests" fail is also because of changed ids of system objects, and also just needs a rewrite:
bin/sqllogictest -- -v test/sqllogictest/mz_catalog_server_index_accounting.slt --rewrite-results
test/sqllogictest/string.slt
Outdated
---- | ||
{" "} | ||
|
||
# string_to_array - whitespace |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(comment copy-pasted from above)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah good catch, thanks
src/expr/src/scalar/func.rs
Outdated
@@ -8102,6 +8175,7 @@ impl VariadicFunc { | |||
ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable) | |||
} | |||
RegexpReplace => ScalarType::String.nullable(in_nullable), | |||
StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... IS NULL
/IS NOT NULL
are rewritten tofalse
/true
. (Or not introduced in other cases, e.g., here.) Generally, we have a lot ofIS NOT NULL
checks, becauseMirRelationExpr::Join
matches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:- Certain subquery simplifications (
try_simplify_quantified_comparisons
) are possible only for non-nullable columns. NOT IN
subqueries choke on nullable inputs: https://github.com/MaterializeInc/database-issues/issues/382#issuecomment-2368827498- https://github.com/MaterializeInc/database-issues/issues/8396
- Certain subquery simplifications (
coalesce
call argument lists are truncated after the first non-nullable argument.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. I left a suggestion more for rendering ... feel free to ignore.
Co-authored-by: Kay Kim <[email protected]>
Co-authored-by: Kay Kim <[email protected]>
add string_to_array function
Motivation
https://github.com/MaterializeInc/database-issues/issues/7101
Checklist
$T ⇔ Proto$T
mapping (possibly in a backwards-incompatible way), then it is tagged with aT-proto
label.