Skip to content

Add posibility to validate number of returned rows for read queries #60

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

vponomaryov
Copy link
Collaborator

@vponomaryov vponomaryov commented Apr 9, 2025

With this change it now becomes possible to validate number of rows returned for common select and select count(...) queries.

Old execute and execute_prepared context methods stay as is and following new methods are added:

  • execute_with_validation
  • execute_prepared_with_validation

These 2 new methods differ from existing 2 by having additional required parameter of 'tuple' type.
That 'tuple' parameter consists of 3 elements:

  • First is minimum expected rows number (inclusive).
  • Second is maximum expected rows number (inclusive).
  • Custom error message, may be just empty string if it is not needed.

Example:

  pub async fn some_select_rune_function(db, i) {
    ...
    let elapsed = db.elapsed_secs();
    let rows_min = if elapsed > 100.0 { 0 } else { 1 };
    let rows_max = if elapsed < 150.0 { 1 } else { 0 };
    let custom_err = "rows must have been deleted by TTL after 100s-200s";
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      (rows_min, rows_max, custom_err),
    ).await?
  }

Above example shows how can we make sure that some our rows get deleted by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations.
Another possible approach is to depend on retries.

One more new context method, that is added in this scope, is get_partition.
It returns an object with 2 attributes - idx (index) and size.

Example:

  pub async fn prepare(db) {
    db.init_partition_row_distribution_preset(
      "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?;
    ...
  }

  pub async fn some_select_rune_function(db, i) {
    let idx = i % ROW_COUNT + OFFSET;
    let partition = db.get_partition("main", idx).await;
    partition.idx += OFFSET;
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      (partition.size, partition.size, ""), // precise matching to calculated partition size
    ).await?
  }

Also, an example rune script is added at workloads/validation.rn
to be able to play with the new feature with minimal efforts.

Latte run command was extended with the new --validation-strategy option.
Examples:

  - latte run ... --validation-strategy=retry // default, retry validation errors
  - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error
  - latte run ... --validation-strategy=ignore // Print the error and go on

With this change it now becomes possible to validate number of rows
returned for common 'select' and 'select count(...)' queries.

Old 'execute' and `execute_prepared' context methods stay as is
and following new methods are added:
- execute_with_validation
- execute_prepared_with_validation

These 2 new methods differ from existing 2 by having additional required
parameter of 'tuple' type.
That 'tuple' parameter consists of 3 elements:
- First is minimum expected rows number (inclusive).
- Second is maximum expected rows number (inclusive).
- Custom error message, may be just empty string if it is not needed.

Example:

  pub async fn some_select_rune_function(db, i) {
    ...
    let elapsed = db.elapsed_secs();
    let rows_min = if elapsed > 100.0 { 0 } else { 1 };
    let rows_max = if elapsed < 150.0 { 1 } else { 0 };
    let custom_err = "rows must have been deleted by TTL after 100s-200s";
    db.execute_prepared_with_validation(
        PREPARED_STATEMENT_NAME,
        [pk],
        (rows_min, rows_max, custom_err),
    ).await?
  }

Above example shows how can we make sure that some our rows get deleted
by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible
time measurement fluctuations.
Another possible approach is to depend on retries.

One more new context method, that is added in this scope,
is 'get_partition'.
It returns an object with 2 attributes - 'idx' (index) and 'size'.

Example:

  pub async fn prepare(db) {
    db.init_partition_row_distribution_preset(
      "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?;
    ...
  }

  pub async fn some_select_rune_function(db, i) {
    let idx = i % ROW_COUNT + OFFSET;
    let partition = db.get_partition("main", idx).await;
    partition.idx += OFFSET;
    db.execute_prepared_with_validation(
      PREPARED_STATEMENT_NAME,
      [pk],
      (partition.size, partition.size, ""), // precise matching to calculated partition size
    ).await?
  }

Also, an example rune script is added at 'workloads/validation.rn' to be
able to play with the new feature with minimal efforts.

Latte 'run' command was extended with the new '--validation-strategy' option.
Examples:

  - latte run ... --validation-strategy=retry // default, retry validation errors
  - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error
  - latte run ... --validation-strategy=ignore // Print the error and go on
@vponomaryov
Copy link
Collaborator Author

vponomaryov commented Apr 9, 2025

Build latte image locally:

make docker-build

Or use existing image: https://hub.docker.com/r/vponomarovatscylladb/hydra-loaders/tags?name=0.28.5-scylladb

Rune script for trying it out is part of the PR:

@vponomaryov
Copy link
Collaborator Author

@fee-mendes , @tarzanek

In case you have some time it would be great to have some real latte users feedback on this feature.

@vponomaryov vponomaryov requested a review from Copilot April 23, 2025 12:00
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR adds the ability to validate the number of rows returned for read queries by introducing new context methods (execute_with_validation and execute_prepared_with_validation) and a new get_partition function, as well as updating configuration to support a validation strategy.

  • Registers new functions in the scripting module to support row count validation.
  • Updates the query execution logic to handle validation errors based on configurable strategies (retry, fail-fast, ignore).
  • Updates partition handling and error reporting to support the new validation approach.

Reviewed Changes

Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.

Show a summary per file
File Description
src/scripting/mod.rs Registers new validation and partition functions in the module.
src/scripting/functions.rs Implements new query execution functions with row count validation.
src/scripting/context.rs Integrates validation strategy into query execution and partition logic.
src/scripting/connect.rs Passes the validation_strategy config to Context.
src/scripting/cass_error.rs Adds error constructors and formats for validation errors.
src/config.rs Introduces a new configuration option for validation strategy.
Cargo.toml Bumps version to reflect new changes.
Files not reviewed (1)
  • workloads/validation.rn: Language not supported
Comments suppressed due to low confidence (1)

src/scripting/functions.rs:586

  • The error message when expected_rows_num_min > expected_rows_num_max is misleading; update it to state that the minimum expected rows number cannot be greater than the maximum.
if expected_rows_num_min > expected_rows_num_max {

@vponomaryov vponomaryov requested a review from fruch April 23, 2025 12:03
@vponomaryov
Copy link
Collaborator Author

@fruch
Can you please review it from the user point of view?
The feature works, but need to understand whether will anyone use it or not...

@fruch
Copy link
Collaborator

fruch commented Apr 23, 2025

@fruch Can you please review it from the user point of view? The feature works, but need to understand whether will anyone use it or not...

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

i.e. there's not actual docs explaining it

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

@fruch
Copy link
Collaborator

fruch commented Apr 23, 2025

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

@fruch
Copy link
Collaborator

fruch commented Apr 23, 2025

we should document in the README the two new functions exposed to the rune script

@vponomaryov
Copy link
Collaborator Author

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

Yes, the (1, 1, "") means we expect exactly 1 row to be returned as part of a query.

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

Yes, main retry configuration gets applied here as-is.

we should document in the README the two new functions exposed to the rune script

So, direction is ok, only docs is a TODO?

@fruch
Copy link
Collaborator

fruch commented Apr 24, 2025

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

(min=1, max=1, msg="response should exactly one")

is more clear an readable

also I would assume that if I put (1, 1, ""), that if the row is deleted, it would fail validation ?

Yes, the (1, 1, "") means we expect exactly 1 row to be returned as part of a query.

also --validation-strategy=retry how many time it would retry ? using the default retry options ?

Yes, main retry configuration gets applied here as-is.

we should document in the README the two new functions exposed to the rune script

So, direction is ok, only docs is a TODO?

the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations

nitpick: if we could pass in just a number, that would map like x - > (x, x, ""), or a help marco/function, I prefer named arguments as APIs, tuple are problematic to extend, and doesn't have defaults.

@fruch fruch requested a review from soyacz April 24, 2025 06:40
@fruch
Copy link
Collaborator

fruch commented Apr 24, 2025

let's get one more opinion from @soyacz about it.

Copy link

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small remarks, but overall LGTM

@@ -93,8 +100,8 @@ impl RowDistributionPreset {
}
}

pub async fn get_partition_idx(&self, idx: u64) -> u64 {
self._get_partition_idx(
pub async fn get_partition(&self, idx: u64) -> (u64, u64) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

helpful to add small docstring what is returned in tuple (previously could guess it from fn name)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about renaming it to the get_partition_info, because we get only some info about it and not data which may be assumed by general get_partition...

Will update

#[derive(Clone, Debug, PartialEq)]
pub struct PartitionGroup {
pub n_rows_per_group: u64,
pub n_partitions: u64,
pub partition_size: u64,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if partition size means partition rows count, then better to name it partition_rows_count

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will update

@vponomaryov
Copy link
Collaborator Author

the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it.

What exactly is not clear? Typical range.

I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple

i.e. there's not actual docs explaining it

It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.

bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear

The commit description covers whole chain of changes, also new, dedicated for it, rune script with comments inside of it.
All of it shows "how to use new feature".

maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users

Why mapping?

(min=1, max=1, msg="response should exactly one")

is more clear an readable

If I abstract from the knowledge I have I would not see "min=1" and "max=1" and something clear, it is very ambiguous.

So, direction is ok, only docs is a TODO?

the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations

I didn't understand the how it would exact for when we'll want to pass into more expectations statement.
Please, re-phrase.

nitpick: if we could pass in just a number, that would map like x - > (x, x, ""), or a help marco/function, I prefer named arguments as APIs, tuple are problematic to extend, and doesn't have defaults.

Rust lang doesn't support defaults for function parameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants