-
Notifications
You must be signed in to change notification settings - Fork 5
Add posibility to validate number of returned rows for read queries #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Build latte image locally:
Or use existing image: https://hub.docker.com/r/vponomarovatscylladb/hydra-loaders/tags?name=0.28.5-scylladb Rune script for trying it out is part of the PR: |
In case you have some time it would be great to have some real latte users feedback on this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds the ability to validate the number of rows returned for read queries by introducing new context methods (execute_with_validation and execute_prepared_with_validation) and a new get_partition function, as well as updating configuration to support a validation strategy.
- Registers new functions in the scripting module to support row count validation.
- Updates the query execution logic to handle validation errors based on configurable strategies (retry, fail-fast, ignore).
- Updates partition handling and error reporting to support the new validation approach.
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/scripting/mod.rs | Registers new validation and partition functions in the module. |
src/scripting/functions.rs | Implements new query execution functions with row count validation. |
src/scripting/context.rs | Integrates validation strategy into query execution and partition logic. |
src/scripting/connect.rs | Passes the validation_strategy config to Context. |
src/scripting/cass_error.rs | Adds error constructors and formats for validation errors. |
src/config.rs | Introduces a new configuration option for validation strategy. |
Cargo.toml | Bumps version to reflect new changes. |
Files not reviewed (1)
- workloads/validation.rn: Language not supported
Comments suppressed due to low confidence (1)
src/scripting/functions.rs:586
- The error message when expected_rows_num_min > expected_rows_num_max is misleading; update it to state that the minimum expected rows number cannot be greater than the maximum.
if expected_rows_num_min > expected_rows_num_max {
@fruch |
the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it. i.e. there's not actual docs explaining it maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users also I would assume that if I put |
also |
we should document in the README the two new functions exposed to the rune script |
What exactly is not clear? Typical range.
It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.
Why mapping?
Yes, the
Yes, main retry configuration gets applied here as-is.
So, direction is ok, only docs is a TODO? |
I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple
bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear
is more clear an readable
the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations nitpick: if we could pass in just a number, that would map like |
let's get one more opinion from @soyacz about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small remarks, but overall LGTM
The commit description covers whole chain of changes, also new, dedicated for it, rune script with comments inside of it.
If I abstract from the knowledge I have I would not see "min=1" and "max=1" and something clear, it is very ambiguous.
I didn't understand the
Rust lang doesn't support defaults for function parameters. |
Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. [1] scylladb/latte#60
Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. The flow of the main stress commands can be represented as following: +-------+--------------------------------------+------------------------------------+ | Time | Stress 1 (Write/Delete) | Stress 2 (Read) | | ------|--------------------------------------|------------------------------------+ | 00:00 | Writing rows 97k+ indexes, rate 50 | Read rows on 0+ indexes, rate 1000 | | 00:20 | Start deleting rows, indexes: 98k+ | Reading rows on 20k+ indexes | | 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes | | 00:42 | Finish writing | Reading rows on 42k+ indexes | | 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 01:40 | ------------------------------------ | Starts 2nd read loop over 100k | | 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 03:20 | ------------------------------------ | Finished reading | +-------|--------------------------------------|------------------------------------+ [1] scylladb/latte#60
With this change it now becomes possible to validate number of rows returned for common 'select' and 'select count(...)' queries. Old 'execute' and `execute_prepared' context methods stay as is and following new methods are added: - execute_with_validation - execute_prepared_with_validation These 2 new methods differ from existing 2 by having additional required parameter of 'vector' type. That 'vector' parameter may have following element combinations: - [Integer] -> Exact number of expected rows - [Integer, Integer] -> Range of expected rows, both values are inclusive. - [Integer, String] -> Exact number of expected rows and custom error message. - [Integer, Integer, String] -> Range of expected rows and custom error message. Example: pub async fn some_select_rune_function(db, i) { ... let elapsed = db.elapsed_secs(); let rows_min = if elapsed > 100.0 { 0 } else { 1 }; let rows_max = if elapsed < 150.0 { 1 } else { 0 }; let custom_err = "rows must have been deleted by TTL after 100s-200s"; db.execute_prepared_with_validation( PREPARED_STATEMENT_NAME, [pk], [rows_min, rows_max, custom_err], ).await? } Above example shows how can we make sure that some our rows get deleted by TTL. The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations. Another possible approach is to depend on retries. One more new context method, that is added in this scope, is 'get_partition_info'. It returns an object with 2 attributes - 'idx' (index) and 'rows_num'. Example: pub async fn prepare(db) { db.init_partition_row_distribution_preset( "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?; ... } pub async fn some_select_rune_function(db, i) { let idx = i % ROW_COUNT + OFFSET; let partition = db.get_partition_info("main", idx).await; partition.idx += OFFSET; db.execute_prepared_with_validation( PREPARED_STATEMENT_NAME, [pk], [partition.rows_num], // precise matching to calculated partition rows number ).await? } Also, an example rune script is added at 'workloads/validation.rn' to be able to play with the new feature with minimal efforts. Latte 'run' command was extended with the new '--validation-strategy' option. Examples: - latte run ... --validation-strategy=retry // default, retry validation errors - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error - latte run ... --validation-strategy=ignore // Print the error and go on
55b1d0d
to
fb1402a
Compare
Updated the PR.
Docker image with the latest changes is this:
Or it can be built manually:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
small nitpick on the docs/examples, but it can be amended later as needed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. The flow of the main stress commands can be represented as following: +-------+--------------------------------------+------------------------------------+ | Time | Stress 1 (Write/Delete) | Stress 2 (Read) | | ------|--------------------------------------|------------------------------------+ | 00:00 | Writing rows 97k+ indexes, rate 50 | Read rows on 0+ indexes, rate 1000 | | 00:20 | Start deleting rows, indexes: 98k+ | Reading rows on 20k+ indexes | | 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes | | 00:42 | Finish writing | Reading rows on 42k+ indexes | | 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 01:40 | ------------------------------------ | Starts 2nd read loop over 100k | | 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 03:20 | ------------------------------------ | Finished reading | +-------|--------------------------------------|------------------------------------+ [1] scylladb/latte#60
Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. The flow of the main stress commands can be represented as following: +-------+--------------------------------------+------------------------------------+ | Time | Stress 1 (Write/Delete) | Stress 2 (Read) | | ------|--------------------------------------|------------------------------------+ | 00:00 | Writing rows 97k+ indexes, rate 50 | Read rows on 0+ indexes, rate 1000 | | 00:20 | Start deleting rows, indexes: 98k+ | Reading rows on 20k+ indexes | | 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes | | 00:42 | Finish writing | Reading rows on 42k+ indexes | | 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 01:40 | ------------------------------------ | Starts 2nd read loop over 100k | | 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 03:20 | ------------------------------------ | Finished reading | +-------|--------------------------------------|------------------------------------+ [1] scylladb/latte#60
Provision CI job configuration example using unmerged latte feature for checking number of returned rows in select queires [1]. In this scenario do following: - Populate data - Run 2 commands in parallel - One deletes 100 rows in 40 seconds after start - Second reads rows and checks that deleted rows don't return in select queries. The flow of the main stress commands can be represented as following: +-------+--------------------------------------+------------------------------------+ | Time | Stress 1 (Write/Delete) | Stress 2 (Read) | | ------|--------------------------------------|------------------------------------+ | 00:00 | Writing rows 97k+ indexes, rate 50 | Read rows on 0+ indexes, rate 1000 | | 00:20 | Start deleting rows, indexes: 98k+ | Reading rows on 20k+ indexes | | 00:22 | Finish deleting 100 rows (98k–98100) | Reading rows on 22k+ indexes | | 00:42 | Finish writing | Reading rows on 42k+ indexes | | 01:37 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 01:40 | ------------------------------------ | Starts 2nd read loop over 100k | | 03:17 | ------------------------------------ | Reached deleted rows (98k–98100)...| | | ------------------------------------ | ...checking that 98k-98100 absent | | 03:20 | ------------------------------------ | Finished reading | +-------|--------------------------------------|------------------------------------+ [1] scylladb/latte#60
With this change it now becomes possible to validate number of rows returned for common
select
andselect count(...)
queries.Old
execute
andexecute_prepared
context methods stay as is and following new methods are added:execute_with_validation
execute_prepared_with_validation
These 2 new methods differ from existing 2 by having additional required parameter of
vector
type.That
vector
parameter may have following element combinations:Example:
Above example shows how can we make sure that some our rows get deleted by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations.
Another possible approach is to depend on retries.
One more new context method, that is added in this scope, is
get_partition_info
.It returns an object with 2 attributes -
idx
(index) androws_num
.Example:
Also, an example rune script is added at
workloads/validation.rn
to be able to play with the new feature with minimal efforts.
Latte
run
command was extended with the new--validation-strategy
option.Examples: