Add posibility to validate number of returned rows for read queries #60

vponomaryov · 2025-04-09T10:15:41Z

With this change it now becomes possible to validate number of rows returned for common select and select count(...) queries.

Old execute and execute_prepared context methods stay as is and following new methods are added:

execute_with_validation
execute_prepared_with_validation

These 2 new methods differ from existing 2 by having additional required parameter of vector type.
That vector parameter may have following element combinations:

[Integer] -> Exact number of expected rows
[Integer, Integer] -> Range of expected rows, both values are inclusive.
[Integer, String] -> Exact number of expected rows and custom error message.
[Integer, Integer, String] -> Range of expected rows and custom error message.

Example:

  pub async fn some_select_rune_function(db, i) {
    ...
    let elapsed = db.elapsed_secs();
    let rows_min = if elapsed > 100.0 { 0 } else { 1 };
    let rows_max = if elapsed < 150.0 { 1 } else { 0 };
    let custom_err = "rows must have been deleted by TTL after 100s-200s";
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      [rows_min, rows_max, custom_err],
    ).await?
  }

Above example shows how can we make sure that some our rows get deleted by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations.
Another possible approach is to depend on retries.

One more new context method, that is added in this scope, is get_partition_info.
It returns an object with 2 attributes - idx (index) and rows_num.

Example:

  pub async fn prepare(db) {
    db.init_partition_row_distribution_preset(
      "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?;
    ...
  }

  pub async fn some_select_rune_function(db, i) {
    let idx = i % ROW_COUNT + OFFSET;
    let partition = db.get_partition_info("main", idx).await;
    partition.idx += OFFSET;
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      [partition.rows_num], // precise matching to calculated partition rows number
    ).await?
  }

Also, an example rune script is added at workloads/validation.rn
to be able to play with the new feature with minimal efforts.

Latte run command was extended with the new --validation-strategy option.
Examples:

  - latte run ... --validation-strategy=retry // default, retry validation errors
  - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error
  - latte run ... --validation-strategy=ignore // Print the error and go on

vponomaryov · 2025-04-09T10:23:20Z

Build latte image locally:

make docker-build

Or use existing image: https://hub.docker.com/r/vponomarovatscylladb/hydra-loaders/tags?name=0.28.5-scylladb

Rune script for trying it out is part of the PR:

https://github.com/scylladb/latte/blob/55b1d0dfccbba818e36358b43611c2b788bd82d0/workloads/validation.rn

vponomaryov · 2025-04-09T10:29:32Z

@fee-mendes , @tarzanek

In case you have some time it would be great to have some real latte users feedback on this feature.

Copilot

Pull Request Overview

This PR adds the ability to validate the number of rows returned for read queries by introducing new context methods (execute_with_validation and execute_prepared_with_validation) and a new get_partition function, as well as updating configuration to support a validation strategy.

Registers new functions in the scripting module to support row count validation.
Updates the query execution logic to handle validation errors based on configurable strategies (retry, fail-fast, ignore).
Updates partition handling and error reporting to support the new validation approach.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file

File	Description
src/scripting/mod.rs	Registers new validation and partition functions in the module.
src/scripting/functions.rs	Implements new query execution functions with row count validation.
src/scripting/context.rs	Integrates validation strategy into query execution and partition logic.
src/scripting/connect.rs	Passes the validation_strategy config to Context.
src/scripting/cass_error.rs	Adds error constructors and formats for validation errors.
src/config.rs	Introduces a new configuration option for validation strategy.
Cargo.toml	Bumps version to reflect new changes.

Files not reviewed (1)

workloads/validation.rn: Language not supported

Comments suppressed due to low confidence (1)

src/scripting/functions.rs:586

The error message when expected_rows_num_min > expected_rows_num_max is misleading; update it to state that the minimum expected rows number cannot be greater than the maximum.

if expected_rows_num_min > expected_rows_num_max {

vponomaryov · 2025-04-23T12:04:54Z

@fruch
Can you please review it from the user point of view?
The feature works, but need to understand whether will anyone use it or not...

fruch · 2025-04-23T13:40:19Z

@fruch Can you please review it from the user point of view? The feature works, but need to understand whether will anyone use it or not...

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

i.e. there's not actual docs explaining it

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

fruch · 2025-04-23T13:41:10Z

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

fruch · 2025-04-23T13:42:44Z

we should document in the README the two new functions exposed to the rune script

vponomaryov · 2025-04-23T13:56:17Z

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

Yes, the (1, 1, "") means we expect exactly 1 row to be returned as part of a query.

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

Yes, main retry configuration gets applied here as-is.

we should document in the README the two new functions exposed to the rune script

So, direction is ok, only docs is a TODO?

fruch · 2025-04-24T06:39:25Z

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

(min=1, max=1, msg="response should exactly one")

is more clear an readable

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

Yes, the (1, 1, "") means we expect exactly 1 row to be returned as part of a query.

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

Yes, main retry configuration gets applied here as-is.

we should document in the README the two new functions exposed to the rune script

So, direction is ok, only docs is a TODO?

the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations

nitpick: if we could pass in just a number, that would map like x - > (x, x, ""), or a help marco/function, I prefer named arguments as APIs, tuple are problematic to extend, and doesn't have defaults.

fruch · 2025-04-24T06:40:51Z

let's get one more opinion from @soyacz about it.

soyacz

small remarks, but overall LGTM

src/scripting/context.rs

vponomaryov · 2025-04-24T10:14:32Z

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear

The commit description covers whole chain of changes, also new, dedicated for it, rune script with comments inside of it.
All of it shows "how to use new feature".

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?
(min=1, max=1, msg="response should exactly one")
is more clear an readable

If I abstract from the knowledge I have I would not see "min=1" and "max=1" and something clear, it is very ambiguous.

So, direction is ok, only docs is a TODO?

the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations

I didn't understand the how it would exact for when we'll want to pass into more expectations statement.
Please, re-phrase.

nitpick: if we could pass in just a number, that would map like x - > (x, x, ""), or a help marco/function, I prefer named arguments as APIs, tuple are problematic to extend, and doesn't have defaults.

Rust lang doesn't support defaults for function parameters.

Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. [1] scylladb/latte#60

Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. The flow of the main stress commands can be represented as following: +-------+--------------------------------------+------------------------------------+ | Time | Stress 1 (Write/Delete) | Stress 2 (Read) | | ------|--------------------------------------|------------------------------------+ | 00:00 | Writing rows 97k+ indexes, rate 50 | Read rows on 0+ indexes, rate 1000 | | 00:20 | Start deleting rows, indexes: 98k+ | Reading rows on 20k+ indexes | | 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes | | 00:42 | Finish writing | Reading rows on 42k+ indexes | | 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 01:40 | ------------------------------------ | Starts 2nd read loop over 100k | | 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 03:20 | ------------------------------------ | Finished reading | +-------|--------------------------------------|------------------------------------+ [1] scylladb/latte#60

With this change it now becomes possible to validate number of rows returned for common 'select' and 'select count(...)' queries. Old 'execute' and `execute_prepared' context methods stay as is and following new methods are added: - execute_with_validation - execute_prepared_with_validation These 2 new methods differ from existing 2 by having additional required parameter of 'vector' type. That 'vector' parameter may have following element combinations: - [Integer] -> Exact number of expected rows - [Integer, Integer] -> Range of expected rows, both values are inclusive. - [Integer, String] -> Exact number of expected rows and custom error message. - [Integer, Integer, String] -> Range of expected rows and custom error message. Example: pub async fn some_select_rune_function(db, i) { ... let elapsed = db.elapsed_secs(); let rows_min = if elapsed > 100.0 { 0 } else { 1 }; let rows_max = if elapsed < 150.0 { 1 } else { 0 }; let custom_err = "rows must have been deleted by TTL after 100s-200s"; db.execute_prepared_with_validation( PREPARED_STATEMENT_NAME, [pk], [rows_min, rows_max, custom_err], ).await? } Above example shows how can we make sure that some our rows get deleted by TTL. The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations. Another possible approach is to depend on retries. One more new context method, that is added in this scope, is 'get_partition_info'. It returns an object with 2 attributes - 'idx' (index) and 'rows_num'. Example: pub async fn prepare(db) { db.init_partition_row_distribution_preset( "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?; ... } pub async fn some_select_rune_function(db, i) { let idx = i % ROW_COUNT + OFFSET; let partition = db.get_partition_info("main", idx).await; partition.idx += OFFSET; db.execute_prepared_with_validation( PREPARED_STATEMENT_NAME, [pk], [partition.rows_num], // precise matching to calculated partition rows number ).await? } Also, an example rune script is added at 'workloads/validation.rn' to be able to play with the new feature with minimal efforts. Latte 'run' command was extended with the new '--validation-strategy' option. Examples: - latte run ... --validation-strategy=retry // default, retry validation errors - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error - latte run ... --validation-strategy=ignore // Print the error and go on

vponomaryov · 2025-05-23T16:18:08Z

@fruch, @soyacz

Updated the PR.
List of changes:

Improved the interface for the new functions to support multiple combinations of input elements. See PR description.
Added info to the README.md file
Renamed get_partition to get_partition_info
Renamed partition_size struct field to the n_rows_per_partition to be less ambiguous and follow existing naming structure.
Updated the new rune script to support latest changes

Docker image with the latest changes is this:

vponomarovatscylladb/hydra-loaders:latte-0.28.5-scylladb-data-validation-v2

Or it can be built manually:

make docker-build

workloads/validation.rn

fruch

LGTM

small nitpick on the docs/examples, but it can be amended later as needed

soyacz

LGTM

Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. The flow of the main stress commands can be represented as following: +-------+--------------------------------------+------------------------------------+ | Time | Stress 1 (Write/Delete) | Stress 2 (Read) | | ------|--------------------------------------|------------------------------------+ | 00:00 | Writing rows 97k+ indexes, rate 50 | Read rows on 0+ indexes, rate 1000 | | 00:20 | Start deleting rows, indexes: 98k+ | Reading rows on 20k+ indexes | | 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes | | 00:42 | Finish writing | Reading rows on 42k+ indexes | | 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 01:40 | ------------------------------------ | Starts 2nd read loop over 100k | | 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 03:20 | ------------------------------------ | Finished reading | +-------|--------------------------------------|------------------------------------+ [1] scylladb/latte#60

vponomaryov requested a review from Copilot April 23, 2025 12:00

Copilot AI reviewed Apr 23, 2025

View reviewed changes

vponomaryov requested a review from fruch April 23, 2025 12:03

fruch requested a review from soyacz April 24, 2025 06:40

soyacz reviewed Apr 24, 2025

View reviewed changes

src/scripting/context.rs Outdated Show resolved Hide resolved

src/scripting/context.rs Outdated Show resolved Hide resolved

vponomaryov mentioned this pull request May 16, 2025

ci(latte): data validation example using latte scylladb/scylla-cluster-tests#10919

Merged

2 tasks

vponomaryov force-pushed the rows-number-validation branch from 55b1d0d to fb1402a Compare May 23, 2025 16:02

vponomaryov requested a review from soyacz May 23, 2025 16:08

fruch reviewed May 25, 2025

View reviewed changes

workloads/validation.rn Show resolved Hide resolved

fruch approved these changes May 25, 2025

View reviewed changes

soyacz approved these changes May 26, 2025

View reviewed changes

vponomaryov merged commit 3118d6e into main May 26, 2025
4 checks passed

fruch mentioned this pull request Jun 4, 2025

execute functions to return responses, so return values can get validated by the workload pkolaczk/latte#56

Open

Add posibility to validate number of returned rows for read queries #60

Add posibility to validate number of returned rows for read queries #60

Uh oh!

Conversation

vponomaryov commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vponomaryov commented Apr 9, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vponomaryov commented Apr 9, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

vponomaryov commented Apr 23, 2025

Uh oh!

fruch commented Apr 23, 2025

Uh oh!

fruch commented Apr 23, 2025

Uh oh!

fruch commented Apr 23, 2025

Uh oh!

vponomaryov commented Apr 23, 2025

Uh oh!

fruch commented Apr 24, 2025

Uh oh!

fruch commented Apr 24, 2025

Uh oh!

soyacz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vponomaryov commented Apr 24, 2025

Uh oh!

vponomaryov commented May 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

fruch left a comment

Choose a reason for hiding this comment

Uh oh!

soyacz left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

vponomaryov commented Apr 9, 2025 •

edited

Loading

vponomaryov commented Apr 9, 2025 •

edited

Loading

vponomaryov commented May 23, 2025 •

edited

Loading