Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of SingleRestrictionEstimatedRowCountTest #1502

Merged
merged 4 commits into from
Jan 24, 2025

Conversation

k-rus
Copy link

@k-rus k-rus commented Jan 14, 2025

Reduces amount of created tables by creating all needed tables in advance. As the result the test can be placed into single test function.
This improves local test execution time from 5.5 seconds down to 1.4 seconds. Reduction in CI from 13 to 5 seconds.

Also removes disabling optimizer, which wasn't necessary.

Checklist before you submit for review

  • Make sure there is a PR in the CNDB project updating the Converged Cassandra version
  • Use NoSpamLogger for log lines that may appear frequently in the logs
  • Verify test results on Butler
  • Test coverage for new/modified code is > 80%
  • Proper code formatting
  • Proper title for each commit staring with the project-issue number, like CNDB-1234
  • Each commit has a meaningful description
  • Each commit is not very long and contains related changes
  • Renames, moves and reformatting are in distinct commits

@k-rus k-rus force-pushed the rf-row-count-test-faster branch from cb4f9c5 to a0fe1cb Compare January 14, 2025 20:32
k-rus added 3 commits January 14, 2025 22:04
There is no need to disable the optimizer, since it cannot optimize
away anything. It was necessary originally during introducing anti-join
node.
Fails due to flush when next table is created, and on cleanup after a
test run.
Reduces amount of created tables by creating all needed tables in
advance. As the result the test can be placed into single test
function.

This improves local test execution time from 5.5 seconds down to 1.4
seconds.
@k-rus k-rus force-pushed the rf-row-count-test-faster branch from a0fe1cb to 0b231f9 Compare January 14, 2025 21:05
@k-rus k-rus requested a review from a team January 15, 2025 08:45
@cassci-bot
Copy link

✔️ Build ds-cassandra-pr-gate/PR-1502 approved by Butler


Approved by Butler
See build details here

Comment on lines 78 to 83
test.doTest(Version.DB, INT, 97.0);
test.doTest(Version.EB, INT, 97.0);
// Truncated numeric types planned differently
test.doTest(Version.DB, DECIMAL, 97.0);
test.doTest(Version.EB, DECIMAL, 97.0);
test.doTest(Version.EB, VARINT, 97.0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: SAITester supports versioning. Why not use that feature instead of manually passing the version?

Copy link
Author

@k-rus k-rus Jan 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestion: SAITester supports versioning. Why not use that feature instead of manually passing the version?

I don't understand how it can be used. To my understanding it will require to rearrange the test cases per SSTables version, which will make tests less useful, i.e., impossible to see the count differences per restriction. Also manual passing allows to see how different versions affect the count.
What do I miss in your proposal?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'd also see different counts, because there would have to be some check like if (version.onOrAfter(Version.EB)) in any place where versions differ. The upside is that it would automatically test all other versions and you'd get tests for new versions for free, if they don't change anything. Just add a version to a list of versions and voila, the test runs on new format.

But it's up to you. I'm not insisting, that's why it was a suggestion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I mentioned in my previous comment your suggestion will hide important differentiation that different versions calculate row counts differently. One difference comes from maintaining cached histograms for the latest version. Different formats of index data can be also a reason, but it wasn't observed and wasn't exhaustively tested.

@pkolaczk What is the functional requirement for the test that you brought the suggestion? Is it because not all versions are tested and more specifically introducing new version will not be covered by the test? I.e., difficult to maintain the test and run into obsolete test?

I can think about providing row counts per version groups and have latest group unbound, i.e., unknown versions will be assuming to implement the histogram. If it sounds good, I think to address it in a separate PR and merge this PR with the current limited approach. What do you think?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if there is a commonly used mechanism to do multi version tests, it should be the default way of testing, not implementing multiple versions in a different way manually. Just for consistency. And yes, being able to quickly add new versions without duplicating most tests is a bonus of using an existing system. But as I said earlier, it is fine to not do this in this PR. I just highlighted that there is this functionality available in the SAITester, and it's really up to you if you find it useful. If you think this would introduce unnecessary complexity on this particular test, no problem, let's merge it. Don't want to hold perfectly fine functionality just to make tests look nicer.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like if there is a commonly used mechanism to do multi version tests, it should be the default way of testing

Applying this default violates the purpose of this test: to demonstrate how versions affect row counts.

@k-rus k-rus merged commit bf469c7 into main Jan 24, 2025
465 of 472 checks passed
@k-rus k-rus deleted the rf-row-count-test-faster branch January 24, 2025 13:34
k-rus added a commit that referenced this pull request Jan 27, 2025
Reduces amount of created tables by creating all needed tables in advance. As the result the test can be placed into single test function.
This improves local test execution time from 5.5 seconds down to 1.4 seconds. Reduction in CI from 13 to 5 seconds.

Also removes disabling optimizer, which wasn't necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants