Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string_to_array function #32045

Merged
merged 18 commits into from
Apr 1, 2025
Merged

Add string_to_array function #32045

merged 18 commits into from
Apr 1, 2025

Conversation

ptravers
Copy link
Contributor

@ptravers ptravers commented Mar 28, 2025

add string_to_array function

Motivation

https://github.com/MaterializeInc/database-issues/issues/7101

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@ptravers ptravers requested review from a team as code owners March 28, 2025 20:19
@ptravers ptravers requested a review from ParkMyCar March 28, 2025 20:19
Copy link

github-actions bot commented Mar 28, 2025

All contributors have signed the CLA.
Posted by the CLA Assistant Lite bot.

@ptravers
Copy link
Contributor Author

I have read the Contributor License Agreement (CLA) and I hereby sign the CLA.

@ptravers ptravers requested a review from a team as a code owner March 28, 2025 22:13
Copy link
Member

@ParkMyCar ParkMyCar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good, but can you address the OIDs and argument types before merging? Want to make sure we adhere to Postgres compatibility.

WOOHOO on your first PR!

Comment on lines 3165 to 3166
params!(String, Any) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3947;
params!(String, Any, String) => VariadicFunc::StringToArray => ScalarType::Array(Box::new(ScalarType::String)), 3948;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We match the oids from Postgres, it looks like these should be 394 and 376 respectively (postgres).

Also, I think the second argument in both of these cases should be String? I think even with the type of String you should be able to pass NULL as an argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done! ty!

@@ -8102,6 +8175,7 @@ impl VariadicFunc {
ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable)
}
RegexpReplace => ScalarType::String.nullable(in_nullable),
StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the output is null if and only if the first argument is null. If this is true, you could give a tighter nullability here by copying the nullability of the first argument. (This would be similar to some of the above lines that use in_nullable, but you can't use that directly, because that takes the nullability of all arguments, whereas you'd need only the first argument.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah neat, thanks! what difference does making the nullability more stringent like that make to the behaviour of the database?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ... IS NULL/IS NOT NULL are rewritten to false/true. (Or not introduced in other cases, e.g., here.) Generally, we have a lot of IS NOT NULL checks, because MirRelationExpr::Join matches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:
  • coalesce call argument lists are truncated after the first non-nullable argument.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah thanks so much! that makes sense

@ptravers ptravers requested a review from ggevay March 31, 2025 21:21
Copy link
Contributor

@ggevay ggevay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

There are some CI failures, but they are straightforward to resolve:

The "Cargo test" fail is just because of changed ids of some system objects. Needs

REWRITE=1 COCKROACH_URL=postgres://root@localhost:26257 cargo test test_http_sql

The "Fast SQL logic tests" fail is also because of changed ids of system objects, and also just needs a rewrite:

bin/sqllogictest -- -v test/sqllogictest/mz_catalog_server_index_accounting.slt --rewrite-results

----
{" "}

# string_to_array - whitespace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(comment copy-pasted from above)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah good catch, thanks

@@ -8102,6 +8175,7 @@ impl VariadicFunc {
ScalarType::Array(Box::new(ScalarType::String)).nullable(in_nullable)
}
RegexpReplace => ScalarType::String.nullable(in_nullable),
StringToArray => ScalarType::Array(Box::new(ScalarType::String)).nullable(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • ... IS NULL/IS NOT NULL are rewritten to false/true. (Or not introduced in other cases, e.g., here.) Generally, we have a lot of IS NOT NULL checks, because MirRelationExpr::Join matches up nulls, which doesn't correspond to SQL join behavior for nulls, so we introduce a lot of null checks just below joins to match the SQL join behavior. So, eliminating these checks usually means slightly less CPU work due to simply not performing null checks. Also, in some cases it can have more dramatic consequences, for example:
  • coalesce call argument lists are truncated after the first non-nullable argument.

Copy link
Contributor

@kay-kim kay-kim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. I left a suggestion more for rendering ... feel free to ignore.

@ptravers ptravers enabled auto-merge April 1, 2025 15:25
@ptravers ptravers disabled auto-merge April 1, 2025 15:25
@ptravers ptravers enabled auto-merge April 1, 2025 15:26
Co-authored-by: Kay Kim <[email protected]>
@ptravers ptravers disabled auto-merge April 1, 2025 16:21
@ptravers ptravers enabled auto-merge April 1, 2025 16:24
@ptravers ptravers merged commit 22b8308 into MaterializeInc:main Apr 1, 2025
84 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants