-
Notifications
You must be signed in to change notification settings - Fork 4
Add posibility to validate number of returned rows for read queries #60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
With this change it now becomes possible to validate number of rows returned for common 'select' and 'select count(...)' queries. Old 'execute' and `execute_prepared' context methods stay as is and following new methods are added: - execute_with_validation - execute_prepared_with_validation These 2 new methods differ from existing 2 by having additional required parameter of 'tuple' type. That 'tuple' parameter consists of 3 elements: - First is minimum expected rows number (inclusive). - Second is maximum expected rows number (inclusive). - Custom error message, may be just empty string if it is not needed. Example: pub async fn some_select_rune_function(db, i) { ... let elapsed = db.elapsed_secs(); let rows_min = if elapsed > 100.0 { 0 } else { 1 }; let rows_max = if elapsed < 150.0 { 1 } else { 0 }; let custom_err = "rows must have been deleted by TTL after 100s-200s"; db.execute_prepared_with_validation( PREPARED_STATEMENT_NAME, [pk], (rows_min, rows_max, custom_err), ).await? } Above example shows how can we make sure that some our rows get deleted by TTL. The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations. Another possible approach is to depend on retries. One more new context method, that is added in this scope, is 'get_partition'. It returns an object with 2 attributes - 'idx' (index) and 'size'. Example: pub async fn prepare(db) { db.init_partition_row_distribution_preset( "main", ROW_COUNT, ROWS_PER_PARTITION, PARTITION_SIZES).await?; ... } pub async fn some_select_rune_function(db, i) { let idx = i % ROW_COUNT + OFFSET; let partition = db.get_partition("main", idx).await; partition.idx += OFFSET; db.execute_prepared_with_validation( PREPARED_STATEMENT_NAME, [pk], (partition.size, partition.size, ""), // precise matching to calculated partition size ).await? } Also, an example rune script is added at 'workloads/validation.rn' to be able to play with the new feature with minimal efforts. Latte 'run' command was extended with the new '--validation-strategy' option. Examples: - latte run ... --validation-strategy=retry // default, retry validation errors - latte run ... --validation-strategy=fail-fast // stop stress on the very first validation error - latte run ... --validation-strategy=ignore // Print the error and go on
Build latte image locally:
Or use existing image: https://hub.docker.com/r/vponomarovatscylladb/hydra-loaders/tags?name=0.28.5-scylladb Rune script for trying it out is part of the PR: |
In case you have some time it would be great to have some real latte users feedback on this feature. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR adds the ability to validate the number of rows returned for read queries by introducing new context methods (execute_with_validation and execute_prepared_with_validation) and a new get_partition function, as well as updating configuration to support a validation strategy.
- Registers new functions in the scripting module to support row count validation.
- Updates the query execution logic to handle validation errors based on configurable strategies (retry, fail-fast, ignore).
- Updates partition handling and error reporting to support the new validation approach.
Reviewed Changes
Copilot reviewed 8 out of 9 changed files in this pull request and generated no comments.
Show a summary per file
File | Description |
---|---|
src/scripting/mod.rs | Registers new validation and partition functions in the module. |
src/scripting/functions.rs | Implements new query execution functions with row count validation. |
src/scripting/context.rs | Integrates validation strategy into query execution and partition logic. |
src/scripting/connect.rs | Passes the validation_strategy config to Context. |
src/scripting/cass_error.rs | Adds error constructors and formats for validation errors. |
src/config.rs | Introduces a new configuration option for validation strategy. |
Cargo.toml | Bumps version to reflect new changes. |
Files not reviewed (1)
- workloads/validation.rn: Language not supported
Comments suppressed due to low confidence (1)
src/scripting/functions.rs:586
- The error message when expected_rows_num_min > expected_rows_num_max is misleading; update it to state that the minimum expected rows number cannot be greater than the maximum.
if expected_rows_num_min > expected_rows_num_max {
@fruch |
the part of (min_rows_number, max_rows_number), is a bit not clear for whom might want to use it. i.e. there's not actual docs explaining it maybe we can use a mapping for this configuration ?, and not a tuple, I think it's a bit less readable to users also I would assume that if I put |
also |
we should document in the README the two new functions exposed to the rune script |
What exactly is not clear? Typical range.
It is not a problem to add the docs, my main open-item is to get to know that the direction is correct and useful.
Why mapping?
Yes, the
Yes, main retry configuration gets applied here as-is.
So, direction is ok, only docs is a TODO? |
I had to read via all the code to find the definition, other was I need to guess the meaning of each value in tuple
bit part of usefulness, is how clear it is, and in this case without docs it's not completely clear
is more clear an readable
the direction seem o.k., even that I don't see how it would exact for when we'll want to pass into more expectations nitpick: if we could pass in just a number, that would map like |
let's get one more opinion from @soyacz about it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
small remarks, but overall LGTM
@@ -93,8 +100,8 @@ impl RowDistributionPreset { | |||
} | |||
} | |||
|
|||
pub async fn get_partition_idx(&self, idx: u64) -> u64 { | |||
self._get_partition_idx( | |||
pub async fn get_partition(&self, idx: u64) -> (u64, u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
helpful to add small docstring what is returned in tuple (previously could guess it from fn name)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought about renaming it to the get_partition_info
, because we get only some info about it and not data which may be assumed by general get_partition
...
Will update
#[derive(Clone, Debug, PartialEq)] | ||
pub struct PartitionGroup { | ||
pub n_rows_per_group: u64, | ||
pub n_partitions: u64, | ||
pub partition_size: u64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if partition size means partition rows count, then better to name it partition_rows_count
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will update
The commit description covers whole chain of changes, also new, dedicated for it, rune script with comments inside of it.
If I abstract from the knowledge I have I would not see "min=1" and "max=1" and something clear, it is very ambiguous.
I didn't understand the
Rust lang doesn't support defaults for function parameters. |
With this change it now becomes possible to validate number of rows returned for common
select
andselect count(...)
queries.Old
execute
andexecute_prepared
context methods stay as is and following new methods are added:execute_with_validation
execute_prepared_with_validation
These 2 new methods differ from existing 2 by having additional required parameter of 'tuple' type.
That 'tuple' parameter consists of 3 elements:
Example:
Above example shows how can we make sure that some our rows get deleted by TTL.
The 50 seconds of [0, 1] range shows how can we mitigate possible time measurement fluctuations.
Another possible approach is to depend on retries.
One more new context method, that is added in this scope, is
get_partition
.It returns an object with 2 attributes -
idx
(index) andsize
.Example:
Also, an example rune script is added at
workloads/validation.rn
to be able to play with the new feature with minimal efforts.
Latte
run
command was extended with the new--validation-strategy
option.Examples: