Skip to content

Add posibility to validate number of returned rows for read queries #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 26, 2025

Conversation

vponomaryov
Copy link
Collaborator

@vponomaryov vponomaryov commented Apr 9, 2025

With this change it now becomes possible to validate number of rows returned for common select and select count(...) queries.

Old execute and execute_prepared context methods stay as is and following new methods are added:

  • execute_with_validation
  • execute_prepared_with_validation

These 2 new methods differ from existing 2 by having additional required parameter of vector type.
That vector parameter may have following element combinations:

  • [Integer] -> Exact number of expected rows
  • [Integer, Integer] -> Range of expected rows, both values are inclusive.
  • [Integer, String] -> Exact number of expected rows and custom error message.
  • [Integer, Integer, String] -> Range of expected rows and custom error message.

Example:

  pub async fn some_select_rune_function(db, i) {
    ...
    let elapsed = db.elapsed_secs();
    let rows_min = if elapsed > 100.0 { 0 } else { 1 };
    let rows_max = if elapsed < 150.0 { 1 } else { 0 };
    let custom_err = "rows must have been deleted by TTL after 100s-200s";
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      [rows_min, rows_max, custom_err],
    ).await?
  }

Above example shows how can we make sure that some our rows get deleted by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations.
Another possible approach is to depend on retries.

One more new context method, that is added in this scope, is get_partition_info.
It returns an object with 2 attributes - idx (index) and rows_num.

Example:

  pub async fn prepare(db) {
    db.init_partition_row_distribution_preset(
      "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?;
    ...
  }

  pub async fn some_select_rune_function(db, i) {
    let idx = i % ROW_COUNT + OFFSET;
    let partition = db.get_partition_info("main", idx).await;
    partition.idx += OFFSET;
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      [partition.rows_num], // precise matching to calculated partition rows number
    ).await?
  }

Also, an example rune script is added at workloads/validation.rn
to be able to play with the new feature with minimal efforts.

Latte run command was extended with the new --validation-strategy option.
Examples:

  - latte run ... --validation-strategy=retry // default, retry validation errors
  - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error
  - latte run ... --validation-strategy=ignore // Print the error and go on

@vponomaryov
Copy link
Collaborator Author

vponomaryov commented Apr 9, 2025

Build latte image locally:

make docker-build

Or use existing image: https://hub.docker.com/r/vponomarovatscylladb/hydra-loaders/tags?name=0.28.5-scylladb

Rune script for trying it out is part of the PR:

@vponomaryov
Copy link
Collaborator Author

@fee-mendes , @tarzanek

In case you have some time it would be great to have some real latte users feedback on this feature.

@vponomaryov vponomaryov requested a review from Copilot April 23, 2025 12:00
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the ability to validate the number of rows returned for read queries by introducing new context methods (execute_with_validation and execute_prepared_with_validation) and a new get_partition function, as well as updating configuration to support a validation strategy.

  • Registers new functions in the scripting module to support row count validation.
  • Updates the query execution logic to handle validation errors based on configurable strategies (retry, fail-fast, ignore).
  • Updates partition handling and error reporting to support the new validation approach.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/scripting/mod.rs Registers new validation and partition functions in the module.
src/scripting/functions.rs Implements new query execution functions with row count validation.
src/scripting/context.rs Integrates validation strategy into query execution and partition logic.
src/scripting/connect.rs Passes the validation_strategy config to Context.
src/scripting/cass_error.rs Adds error constructors and formats for validation errors.
src/config.rs Introduces a new configuration option for validation strategy.
Cargo.toml Bumps version to reflect new changes.
Files not reviewed (1)
  • workloads/validation.rn: Language not supported
Comments suppressed due to low confidence (1)

src/scripting/functions.rs:586

  • The error message when expected_rows_num_min > expected_rows_num_max is misleading; update it to state that the minimum expected rows number cannot be greater than the maximum.
if expected_rows_num_min > expected_rows_num_max {

@vponomaryov vponomaryov requested a review from fruch April 23, 2025 12:03
@vponomaryov
Copy link
Collaborator Author

@fruch
Can you please review it from the user point of view?
The feature works, but need to understand whether will anyone use it or not...

@fruch
Copy link
Collaborator

fruch commented Apr 23, 2025

@fruch Can you please review it from the user point of view? The feature works, but need to understand whether will anyone use it or not...

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

i.e. there's not actual docs explaining it

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

@fruch
Copy link
Collaborator

fruch commented Apr 23, 2025

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

@fruch
Copy link
Collaborator

fruch commented Apr 23, 2025

we should document in the README the two new functions exposed to the rune script

@vponomaryov
Copy link
Collaborator Author

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

Yes, the (1, 1, "") means we expect exactly 1 row to be returned as part of a query.

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

Yes, main retry configuration gets applied here as-is.

we should document in the README the two new functions exposed to the rune script

So, direction is ok, only docs is a TODO?

@fruch
Copy link
Collaborator

fruch commented Apr 24, 2025

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

(min=1, max=1, msg="response should exactly one")

is more clear an readable

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

Yes, the (1, 1, "") means we expect exactly 1 row to be returned as part of a query.

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

Yes, main retry configuration gets applied here as-is.

we should document in the README the two new functions exposed to the rune script

So, direction is ok, only docs is a TODO?

the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations

nitpick: if we could pass in just a number, that would map like x - > (x, x, ""), or a help marco/function, I prefer named arguments as APIs, tuple are problematic to extend, and doesn't have defaults.

@fruch fruch requested a review from soyacz April 24, 2025 06:40
@fruch
Copy link
Collaborator

fruch commented Apr 24, 2025

let's get one more opinion from @soyacz about it.

Copy link

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small remarks, but overall LGTM

@vponomaryov
Copy link
Collaborator Author

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear

The commit description covers whole chain of changes, also new, dedicated for it, rune script with comments inside of it.
All of it shows "how to use new feature".

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

(min=1, max=1, msg="response should exactly one")

is more clear an readable

If I abstract from the knowledge I have I would not see "min=1" and "max=1" and something clear, it is very ambiguous.

So, direction is ok, only docs is a TODO?

the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations

I didn't understand the how it would exact for when we'll want to pass into more expectations statement.
Please, re-phrase.

nitpick: if we could pass in just a number, that would map like x - > (x, x, ""), or a help marco/function, I prefer named arguments as APIs, tuple are problematic to extend, and doesn't have defaults.

Rust lang doesn't support defaults for function parameters.

vponomaryov added a commit to vponomaryov/scylla-cluster-tests that referenced this pull request May 16, 2025
Provision CI job configuration example using unmerged latte feature
for checking number of returned rows in select queires [1].

In this scenario do following:
- Populate data
- Run 2 commands in parallel
  - One deletes 100 rows in 40 seconds after start
  - Second reads rows and checks that deleted rows don't return in
    select queries.

[1] scylladb/latte#60
vponomaryov added a commit to vponomaryov/scylla-cluster-tests that referenced this pull request May 19, 2025
Provision CI job configuration example using unmerged latte feature
for checking number of returned rows in select queires [1].

In this scenario do following:
- Populate data
- Run 2 commands in parallel
  - One deletes 100 rows in 40 seconds after start
  - Second reads rows and checks that deleted rows don't return in
    select queries.

The flow of the main stress commands can be represented as following:
+-------+--------------------------------------+------------------------------------+
| Time  | Stress 1 (Write/Delete)              | Stress 2 (Read)                    |
| ------|--------------------------------------|------------------------------------+
| 00:00 | Writing rows 97k+ indexes, rate 50   | Read rows on 0+ indexes, rate 1000 |
| 00:20 | Start deleting rows, indexes: 98k+   | Reading rows on 20k+ indexes       |
| 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes       |
| 00:42 | Finish writing                       | Reading rows on 42k+ indexes       |
| 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 01:40 | ------------------------------------ | Starts 2nd read loop over 100k     |
| 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 03:20 | ------------------------------------ | Finished reading                   |
+-------|--------------------------------------|------------------------------------+

[1] scylladb/latte#60
With this change it now becomes possible to validate number of rows
returned for common 'select' and 'select count(...)' queries.

Old 'execute' and `execute_prepared' context methods stay as is
and following new methods are added:
- execute_with_validation
- execute_prepared_with_validation

These 2 new methods differ from existing 2 by having additional required
parameter of 'vector' type.
That 'vector' parameter may have following element combinations:
- [Integer] -> Exact number of expected rows
- [Integer, Integer] -> Range of expected rows, both values are
  inclusive.
- [Integer, String] -> Exact number of expected rows and custom error message.
- [Integer, Integer, String] -> Range of expected rows and custom error
  message.

Example:

  pub async fn some_select_rune_function(db, i) {
    ...
    let elapsed = db.elapsed_secs();
    let rows_min = if elapsed > 100.0 { 0 } else { 1 };
    let rows_max = if elapsed < 150.0 { 1 } else { 0 };
    let custom_err = "rows must have been deleted by TTL after 100s-200s";
    db.execute_prepared_with_validation(
        PREPARED_STATEMENT_NAME,
        [pk],
        [rows_min, rows_max, custom_err],
    ).await?
  }

Above example shows how can we make sure that some our rows get deleted
by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible
time measurement fluctuations.
Another possible approach is to depend on retries.

One more new context method, that is added in this scope,
is 'get_partition_info'.
It returns an object with 2 attributes - 'idx' (index) and 'rows_num'.

Example:

  pub async fn prepare(db) {
    db.init_partition_row_distribution_preset(
      "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?;
    ...
  }

  pub async fn some_select_rune_function(db, i) {
    let idx = i % ROW_COUNT + OFFSET;
    let partition = db.get_partition_info("main", idx).await;
    partition.idx += OFFSET;
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      [partition.rows_num], // precise matching to calculated partition rows number
    ).await?
  }

Also, an example rune script is added at 'workloads/validation.rn' to be
able to play with the new feature with minimal efforts.

Latte 'run' command was extended with the new '--validation-strategy' option.
Examples:

  - latte run ... --validation-strategy=retry // default, retry validation errors
  - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error
  - latte run ... --validation-strategy=ignore // Print the error and go on
@vponomaryov vponomaryov force-pushed the rows-number-validation branch from 55b1d0d to fb1402a Compare May 23, 2025 16:02
@vponomaryov vponomaryov requested a review from soyacz May 23, 2025 16:08
@vponomaryov
Copy link
Collaborator Author

vponomaryov commented May 23, 2025

@fruch, @soyacz

Updated the PR.
List of changes:

  • Improved the interface for the new functions to support multiple combinations of input elements. See PR description.
  • Added info to the README.md file
  • Renamed get_partition to get_partition_info
  • Renamed partition_size struct field to the n_rows_per_partition to be less ambiguous and follow existing naming structure.
  • Updated the new rune script to support latest changes

Docker image with the latest changes is this:

  • vponomarovatscylladb/hydra-loaders:latte-0.28.5-scylladb-data-validation-v2

Or it can be built manually:

  • make docker-build

Copy link
Collaborator

@fruch fruch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

small nitpick on the docs/examples, but it can be amended later as needed

Copy link

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@vponomaryov vponomaryov merged commit 3118d6e into main May 26, 2025
4 checks passed
vponomaryov added a commit to vponomaryov/scylla-cluster-tests that referenced this pull request May 27, 2025
Provision CI job configuration example using unmerged latte feature
for checking number of returned rows in select queires [1].

In this scenario do following:
- Populate data
- Run 2 commands in parallel
  - One deletes 100 rows in 40 seconds after start
  - Second reads rows and checks that deleted rows don't return in
    select queries.

The flow of the main stress commands can be represented as following:
+-------+--------------------------------------+------------------------------------+
| Time  | Stress 1 (Write/Delete)              | Stress 2 (Read)                    |
| ------|--------------------------------------|------------------------------------+
| 00:00 | Writing rows 97k+ indexes, rate 50   | Read rows on 0+ indexes, rate 1000 |
| 00:20 | Start deleting rows, indexes: 98k+   | Reading rows on 20k+ indexes       |
| 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes       |
| 00:42 | Finish writing                       | Reading rows on 42k+ indexes       |
| 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 01:40 | ------------------------------------ | Starts 2nd read loop over 100k     |
| 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 03:20 | ------------------------------------ | Finished reading                   |
+-------|--------------------------------------|------------------------------------+

[1] scylladb/latte#60
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request Jun 5, 2025
Provision CI job configuration example using unmerged latte feature
for checking number of returned rows in select queires [1].

In this scenario do following:
- Populate data
- Run 2 commands in parallel
  - One deletes 100 rows in 40 seconds after start
  - Second reads rows and checks that deleted rows don't return in
    select queries.

The flow of the main stress commands can be represented as following:
+-------+--------------------------------------+------------------------------------+
| Time  | Stress 1 (Write/Delete)              | Stress 2 (Read)                    |
| ------|--------------------------------------|------------------------------------+
| 00:00 | Writing rows 97k+ indexes, rate 50   | Read rows on 0+ indexes, rate 1000 |
| 00:20 | Start deleting rows, indexes: 98k+   | Reading rows on 20k+ indexes       |
| 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes       |
| 00:42 | Finish writing                       | Reading rows on 42k+ indexes       |
| 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 01:40 | ------------------------------------ | Starts 2nd read loop over 100k     |
| 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 03:20 | ------------------------------------ | Finished reading                   |
+-------|--------------------------------------|------------------------------------+

[1] scylladb/latte#60
yarongilor pushed a commit to yarongilor/scylla-cluster-tests that referenced this pull request Jun 12, 2025
Provision CI job configuration example using unmerged latte feature
for checking number of returned rows in select queires [1].

In this scenario do following:
- Populate data
- Run 2 commands in parallel
  - One deletes 100 rows in 40 seconds after start
  - Second reads rows and checks that deleted rows don't return in
    select queries.

The flow of the main stress commands can be represented as following:
+-------+--------------------------------------+------------------------------------+
| Time  | Stress 1 (Write/Delete)              | Stress 2 (Read)                    |
| ------|--------------------------------------|------------------------------------+
| 00:00 | Writing rows 97k+ indexes, rate 50   | Read rows on 0+ indexes, rate 1000 |
| 00:20 | Start deleting rows, indexes: 98k+   | Reading rows on 20k+ indexes       |
| 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes       |
| 00:42 | Finish writing                       | Reading rows on 42k+ indexes       |
| 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 01:40 | ------------------------------------ | Starts 2nd read loop over 100k     |
| 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...|
|       | ------------------------------------ | ...checking that 98k-98100 absent  |
| 03:20 | ------------------------------------ | Finished reading                   |
+-------|--------------------------------------|------------------------------------+

[1] scylladb/latte#60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants