Skip to content
Closed
11 changes: 11 additions & 0 deletions src/bin/cql-stress-scylla-bench/args.rs
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ pub(crate) struct ScyllaBenchArgs {
pub mode: Mode,
pub latency_type: LatencyType,
pub max_consecutive_errors_per_op: u64,
pub max_errors_in_total: u64,
pub concurrency: u64,
pub maximum_rate: u64,

Expand Down Expand Up @@ -178,6 +179,12 @@ where
After exceeding it, the workflow will terminate with an error. \
Set to 0 if you want to disable this limit",
);
let max_errors = flag.u64_var(
"error-limit",
0,
"the number of total errors after which the workflow should stop and fail; \
set it to 0 (the default) to disable this limit",
);
let concurrency = flag.u64_var("concurrency", 16, "number of used tasks");
let maximum_rate = flag.u64_var(
"max-rate",
Expand Down Expand Up @@ -332,6 +339,9 @@ where
// therefore just subtract with wraparound and treat u64::MAX as infinity
let max_consecutive_errors_per_op = max_errors_at_row.get().wrapping_sub(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with the above comments. From what I understood, the error-at-row-limit defines the number of errors that can occur before failing. In other words, it defines the number of retries - in the worst case scenario we should try to perform the operation error-at-row-limit + 1 times (first try + error-at-row-limit retries).

I think we should either change it to Option<NonZeroU64> (set to None when error-at-row-limit is 0) and remove the wrapping_sub or change the >= to > in WorkerSession::end_operation here:

Err(err) if self.consecutive_errors >= self.context.max_consecutive_errors_per_op => {
    Err(err)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be also nice to add tests for edge cases such as error-at-row-limit = 1. In this case, the following scenario would be OK:

1st try: ERROR
2nd try: SUCCESS

, but this should fail:

1st try: ERROR
2nd try: ERROR
... there is no 3rd try, benchmarks terminates

As of now, I believe that the first scenario wouldn't pass since we use wrapping_sub(1) and we make use of >= comparison.


// Similar to above
let max_errors_in_total = max_errors.get().wrapping_sub(1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to make sure i understand correctly - this option (unlike error-at-row-limit) defines the number of errors after which the whole run fails. In other words, when error-limit = 1, we immediately fail the run after the first error occurs. Am i right?

If so, then this wrapping_sub(1) looks semantically correct to me (looking at the WorkerContext::decrement_global_retry_counter).


let hdr_latency_resolution = match hdr_latency_units.get().as_str() {
"ns" => 1,
"us" => 1000,
Expand Down Expand Up @@ -377,6 +387,7 @@ where
concurrency,
latency_type,
max_consecutive_errors_per_op,
max_errors_in_total,
maximum_rate,
test_duration: test_duration.get(),
partition_count,
Expand Down
2 changes: 1 addition & 1 deletion src/bin/cql-stress-scylla-bench/main.rs
Original file line number Diff line number Diff line change
Expand Up @@ -146,7 +146,7 @@ async fn prepare(args: Arc<ScyllaBenchArgs>, stats: Arc<ShardedStats>) -> Result
rate_limit_per_second,
operation_factory,
max_consecutive_errors_per_op: args.max_consecutive_errors_per_op,
max_errors_in_total: u64::MAX,
max_errors_in_total: args.max_errors_in_total,
})
}

Expand Down