fix(tier1): Restore tuned tier1 instances#15208
Conversation
For tests that were tuned to specific disk usage, restore the hardcoded instances so it adhered to
|
Warning Review limit reached
More reviews will be available in 15 minutes and 37 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Plus Run ID: 📒 Files selected for processing (4)
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
✅ Test Summary: PASSED✅ Precommit: PASSED
✅ Tests: PASSED
|
| "cassandra-stress user profile=/tmp/cs_mv_profile.yaml ops'(insert=3,read1=1,read2=1,read3=1)' cl=QUORUM duration=6800m -mode cql3 native -rate threads=10", | ||
| "cassandra-stress read cl=QUORUM duration=6800m -mode cql3 native -rate threads=10 -pop seq=1..1840000000 -log interval=5 -col 'size=FIXED(200) n=FIXED(5)'" ] | ||
|
|
||
| instance_type_db: 'i8g.2xlarge' |
There was a problem hiding this comment.
perhaps the naming of 2TB_high_disk_used_capacity.yaml is not ideal, since it really just defines a 2TB dataset, that can possibly be used for multiple setups.
perhaps it would make more sense to hardcode the instance type in test-cases/longevity/longevity-1TB-5days-authorization-and-tls-ssl.yaml that is the base-configuration of this "extention" file.
and also , it's less meaningful to set instance where no n_db_nodes parameter is found and is taken from another file.
|
|
||
| n_db_nodes: 6 | ||
| n_loaders: 3 | ||
| instance_type_db: 'i8ge.xlarge' |
There was a problem hiding this comment.
Why is it needed?
What's tuned in this PR?
There was a problem hiding this comment.
@cezarmoise As a owner of the testcase, can you elaborate more?
|
|
||
| region_aware_loader: true | ||
| simulated_racks: 0 | ||
| instance_type_db: 'i7i.xlarge' |
There was a problem hiding this comment.
I don't think it should be set, it doesn't really significant if the size is bigger.
There was a problem hiding this comment.
@cezarmoise As a owner of the testcase, do you agree?
There was a problem hiding this comment.
Where is this configuration being used?
What longevities? which tests?
There was a problem hiding this comment.
There was a problem hiding this comment.
Why is it overriding a test that is called 1TB with a 2TB configuration?
Also, I don't understand why we need to pin it to a specific type.
There was a problem hiding this comment.
- it was set to i7i as we include i7i support to test it more regularly
- someone scale up the test with more data to get to more disk utilization (new pipeline)
- we introduced the sizing option to all tier1 cases, base on the instance value it was (including x86)
@pehala suggest pinning it cause he's scared the sizing would end up selecting other types or sized that would break the assumption about disk usage
one can argue if it's important or not, but the main goal of the sizing is preserved, i.e. other backend would be picking good enough instances, even if disk usage pattern would be a different.
so it's o.k. to use sizing and fixed in tandam (it's per it's design), the only risk it might drift when doing changes to either one, and if it's critical enough to keep the disk usage ratio ontop of the flexibility of change defaults of the sizing feature.
There was a problem hiding this comment.
- I don't think it's critical to keep the disk ratio as long as it's not too small.
- I rather not mixing up the two approaches and stick only with the new approach, avoiding as much as possible from fixing a test to instance type.
There was a problem hiding this comment.
I don't think it's critical to keep the disk ratio as long as it's not too small.
I do, cluster has to work harder if it has more data and thus the testcase is better, besides we want to decrease TCO and thus we size customers clusters to have much more data -> so we should test more data as well.
I rather not mixing up the two approaches and stick only with the new approach, avoiding as much as possible from fixing a test to instance type.
There are fully compatible and use advantages of both and this is still only short term measure, we want to parametrize CS commands to fill it based on available disk size in the long term.
TLDR: There is no harm done with this PR, Tier 1 will be still working on ALL clouds
There was a problem hiding this comment.
There should be (and I believe there are) scenarios that are size specific.
We also aim for instance type agnostic, so we can switch instance types as we want and/or based on availability.
There was a problem hiding this comment.
There should be (and I believe there are) scenarios that are size specific.
We also aim for instance type agnostic, so we can switch instance types as we want and/or based on availability.
Yes, but currently instance type agnostic is incompatible with size specific, which why this patch exists in the first place. I want to keep Tier 1 tests as size specific and add a feature which will make them type agnostic later. It is still agnostic on other clouds and it can run there, so again no harm done
For tests that were tuned to specific disk usage, restore the hardcoded instances so the tuning is adhered to now and in the future. I do not want these tests changing without us knowing. Leave the sizing info so it can be still used cross cloud, just the tuning probably wont be exact, but that is expected.
To make these cases use dynamic sizing, we need to parametrize command based on disk size, which is possible but will require additional effort (plan in progress)
Testing
PR pre-checks (self review)
backportlabels