fix(tier1): Restore tuned tier1 instances by pehala · Pull Request #15208 · scylladb/scylla-cluster-tests

pehala · 2026-06-25T09:21:22Z

For tests that were tuned to specific disk usage, restore the hardcoded instances so the tuning is adhered to now and in the future. I do not want these tests changing without us knowing. Leave the sizing info so it can be still used cross cloud, just the tuning probably wont be exact, but that is expected.

To make these cases use dynamic sizing, we need to parametrize command based on disk size, which is possible but will require additional effort (plan in progress)

Testing

none

PR pre-checks (self review)

I added the relevant backport labels
I didn't leave commented-out/debugging code

For tests that were tuned to specific disk usage, restore the hardcoded instances so it adhered to

coderabbitai · 2026-06-25T09:21:33Z

Warning

Review limit reached

@pehala, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 15 minutes and 37 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro Plus

Run ID: 298bb544-c637-431e-ba34-6a36551f8ac2

📥 Commits

Reviewing files that changed from the base of the PR and between dc81ff6 and fda7a34.

📒 Files selected for processing (4)

configurations/2TB_high_disk_used_capacity.yaml
test-cases/features/elasticity/elasticity-90-percent-with-nemesis.yaml
test-cases/longevity/longevity-multidc-parallel-topology-schema-changes-12h.yaml
test-cases/longevity/longevity-twcs-48h.yaml

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

scylladb-promoter · 2026-06-25T09:30:04Z

✅ Test Summary: PASSED

✅ Precommit: PASSED

Total	Passed	Failed	Skipped
26	15	0	11

✅ Tests: PASSED

Total	Passed	Failed	Errors	Skipped
3667	3636	0	0	31

Full build log

yarongilor · 2026-06-25T11:25:06Z

             "cassandra-stress user profile=/tmp/cs_mv_profile.yaml ops'(insert=3,read1=1,read2=1,read3=1)' cl=QUORUM duration=6800m -mode cql3 native -rate threads=10",
             "cassandra-stress read cl=QUORUM duration=6800m -mode cql3 native  -rate threads=10 -pop seq=1..1840000000  -log interval=5 -col 'size=FIXED(200) n=FIXED(5)'" ]

+instance_type_db: 'i8g.2xlarge'


perhaps the naming of 2TB_high_disk_used_capacity.yaml is not ideal, since it really just defines a 2TB dataset, that can possibly be used for multiple setups.

perhaps it would make more sense to hardcode the instance type in test-cases/longevity/longevity-1TB-5days-authorization-and-tls-ssl.yaml that is the base-configuration of this "extention" file.

and also , it's less meaningful to set instance where no n_db_nodes parameter is found and is taken from another file.

Then separate this.

roydahan · 2026-06-25T13:51:10Z


 n_db_nodes: 6
 n_loaders: 3
+instance_type_db: 'i8ge.xlarge'


Why is it needed?
What's tuned in this PR?

@cezarmoise As a owner of the testcase, can you elaborate more?

roydahan · 2026-06-25T13:52:03Z


 region_aware_loader: true
 simulated_racks: 0
+instance_type_db: 'i7i.xlarge'


I don't think it should be set, it doesn't really significant if the size is bigger.

@cezarmoise As a owner of the testcase, do you agree?

roydahan · 2026-06-25T13:52:52Z

Where is this configuration being used?
What longevities? which tests?

https://github.com/scylladb/scylla-cluster-tests/blob/master/jenkins-pipelines/oss/tier1/longevity-2tb-5days.jenkinsfile

Why is it overriding a test that is called 1TB with a 2TB configuration?
Also, I don't understand why we need to pin it to a specific type.

it was set to i7i as we include i7i support to test it more regularly

someone scale up the test with more data to get to more disk utilization (new pipeline)

we introduced the sizing option to all tier1 cases, base on the instance value it was (including x86)

@pehala suggest pinning it cause he's scared the sizing would end up selecting other types or sized that would break the assumption about disk usage

one can argue if it's important or not, but the main goal of the sizing is preserved, i.e. other backend would be picking good enough instances, even if disk usage pattern would be a different.

so it's o.k. to use sizing and fixed in tandam (it's per it's design), the only risk it might drift when doing changes to either one, and if it's critical enough to keep the disk usage ratio ontop of the flexibility of change defaults of the sizing feature.

I don't think it's critical to keep the disk ratio as long as it's not too small.

I rather not mixing up the two approaches and stick only with the new approach, avoiding as much as possible from fixing a test to instance type.

I don't think it's critical to keep the disk ratio as long as it's not too small.

I do, cluster has to work harder if it has more data and thus the testcase is better, besides we want to decrease TCO and thus we size customers clusters to have much more data -> so we should test more data as well.

I rather not mixing up the two approaches and stick only with the new approach, avoiding as much as possible from fixing a test to instance type.

There are fully compatible and use advantages of both and this is still only short term measure, we want to parametrize CS commands to fill it based on available disk size in the long term.

TLDR: There is no harm done with this PR, Tier 1 will be still working on ALL clouds

There should be (and I believe there are) scenarios that are size specific.

We also aim for instance type agnostic, so we can switch instance types as we want and/or based on availability.

There should be (and I believe there are) scenarios that are size specific.

We also aim for instance type agnostic, so we can switch instance types as we want and/or based on availability.

Yes, but currently instance type agnostic is incompatible with size specific, which why this patch exists in the first place. I want to keep Tier 1 tests as size specific and add a feature which will make them type agnostic later. It is still agnostic on other clouds and it can run there, so again no harm done

fix(tier1): Restore tuned tier1 instances

fda7a34

For tests that were tuned to specific disk usage, restore the hardcoded instances so it adhered to

pehala requested review from a team, cezarmoise, jsmolar and yarongilor June 25, 2026 09:21

pehala added backport/2026.1 backport/2026.2 labels Jun 25, 2026

github-actions Bot assigned pehala Jun 25, 2026

cezarmoise approved these changes Jun 25, 2026

View reviewed changes

yarongilor reviewed Jun 25, 2026

View reviewed changes

roydahan reviewed Jun 25, 2026

View reviewed changes

roydahan requested changes Jun 25, 2026

View reviewed changes

Uh oh!

Conversation

pehala commented Jun 25, 2026

Testing

PR pre-checks (self review)

Uh oh!

coderabbitai Bot commented Jun 25, 2026

Review limit reached

Uh oh!

scylladb-promoter commented Jun 25, 2026

✅ Test Summary: PASSED

✅ Precommit: PASSED

✅ Tests: PASSED

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants