-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of SingleRestrictionEstimatedRowCountTest #1502
Conversation
cb4f9c5
to
a0fe1cb
Compare
There is no need to disable the optimizer, since it cannot optimize away anything. It was necessary originally during introducing anti-join node.
Fails due to flush when next table is created, and on cleanup after a test run.
Reduces amount of created tables by creating all needed tables in advance. As the result the test can be placed into single test function. This improves local test execution time from 5.5 seconds down to 1.4 seconds.
a0fe1cb
to
0b231f9
Compare
Quality Gate passedIssues Measures |
✔️ Build ds-cassandra-pr-gate/PR-1502 approved by ButlerApproved by Butler |
test.doTest(Version.DB, INT, 97.0); | ||
test.doTest(Version.EB, INT, 97.0); | ||
// Truncated numeric types planned differently | ||
test.doTest(Version.DB, DECIMAL, 97.0); | ||
test.doTest(Version.EB, DECIMAL, 97.0); | ||
test.doTest(Version.EB, VARINT, 97.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: SAITester supports versioning. Why not use that feature instead of manually passing the version?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggestion: SAITester supports versioning. Why not use that feature instead of manually passing the version?
I don't understand how it can be used. To my understanding it will require to rearrange the test cases per SSTables version, which will make tests less useful, i.e., impossible to see the count differences per restriction. Also manual passing allows to see how different versions affect the count.
What do I miss in your proposal?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You'd also see different counts, because there would have to be some check like if (version.onOrAfter(Version.EB))
in any place where versions differ. The upside is that it would automatically test all other versions and you'd get tests for new versions for free, if they don't change anything. Just add a version to a list of versions and voila, the test runs on new format.
But it's up to you. I'm not insisting, that's why it was a suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned in my previous comment your suggestion will hide important differentiation that different versions calculate row counts differently. One difference comes from maintaining cached histograms for the latest version. Different formats of index data can be also a reason, but it wasn't observed and wasn't exhaustively tested.
@pkolaczk What is the functional requirement for the test that you brought the suggestion? Is it because not all versions are tested and more specifically introducing new version will not be covered by the test? I.e., difficult to maintain the test and run into obsolete test?
I can think about providing row counts per version groups and have latest group unbound, i.e., unknown versions will be assuming to implement the histogram. If it sounds good, I think to address it in a separate PR and merge this PR with the current limited approach. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like if there is a commonly used mechanism to do multi version tests, it should be the default way of testing, not implementing multiple versions in a different way manually. Just for consistency. And yes, being able to quickly add new versions without duplicating most tests is a bonus of using an existing system. But as I said earlier, it is fine to not do this in this PR. I just highlighted that there is this functionality available in the SAITester, and it's really up to you if you find it useful. If you think this would introduce unnecessary complexity on this particular test, no problem, let's merge it. Don't want to hold perfectly fine functionality just to make tests look nicer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like if there is a commonly used mechanism to do multi version tests, it should be the default way of testing
Applying this default violates the purpose of this test: to demonstrate how versions affect row counts.
Reduces amount of created tables by creating all needed tables in advance. As the result the test can be placed into single test function. This improves local test execution time from 5.5 seconds down to 1.4 seconds. Reduction in CI from 13 to 5 seconds. Also removes disabling optimizer, which wasn't necessary.
Reduces amount of created tables by creating all needed tables in advance. As the result the test can be placed into single test function.
This improves local test execution time from 5.5 seconds down to 1.4 seconds. Reduction in CI from 13 to 5 seconds.
Also removes disabling optimizer, which wasn't necessary.
Checklist before you submit for review
NoSpamLogger
for log lines that may appear frequently in the logs