Skip to content

feature(perf): support latte in gradual grow throughput test #10901

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

vponomaryov
Copy link
Contributor

@vponomaryov vponomaryov commented May 15, 2025

Gradual throughput test uses several substitution variables and 2 of them,
called $threads and $throttle, are not compatible as-is with the latte benchmarking tool.

First, Latte has --threads=N and --concurrency=M parameters
whereas cassandra-stress has only --threads.
So, latte's values must be multiplied to get CS's value.

Second, latte uses --rate=100 format not fixed=100/s or throttle=100s like CS does.

Knowing above, add appropriate parsing of the latte commands
to support existing substitution values.

Example:

Following CS command:

  cassandra-stress mixed cl=QUORUM duration=$duration -mode cql3 native \
    -rate 'threads=$threads $throttle' \
    -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=15000001..20000000

Can be simulated by the following Latte command:

  latte run --function=write,read --tag=mixed --sampling=10s --duration=$duration \
  $throttle --threads=7 --concurrency=$threads --consistency=QUORUM \
  --start-cycle 15000001 --end-cycle 20000000 \
  -P key_size=10 -P column_size=1024 -P column_count=1 -P row_count=20000000 \
  data_dir/latte/latte_cs_alike.rn

In case of Latte we specify --threads explicitly as (N-1) value where N is number of CPU cores on loader nodes.

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces support for the Latte benchmarking tool in the gradual throughput tests by parsing its unique command line options for threads and rate.

  • Added a regex-based function to extract latte thread parameters.
  • Introduced a helper function to adjust the CS thread count based on latte threads.
  • Updated throttle formatting logic to distinguish between Latte and legacy CS commands.

Reviewed Changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
sdcm/stress/latte_thread.py Added regex and extraction function to determine Latte's thread count.
performance_regression_gradual_grow_throughput.py Integrated Latte thread count logic and updated throttle handling in test steps.

@vponomaryov vponomaryov force-pushed the support-latte-for-gradual-perf-tests branch from d121a79 to 8802a35 Compare May 15, 2025 10:55
@vponomaryov
Copy link
Contributor Author

Copy link
Contributor

@soyacz soyacz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC currently, still user needs to watch how many CPUs loaders have.

for me, the ultimate solution would be to introduce 'latte_cs_thread' that could parse prefixed by latte c-s command like: latte <full cassandra stress command>

This way, the thread could take proper rune scripts, translate c-s command to latte one and we wouldn't have to learn new syntax for running tests with latte (when need c-s like workload). WDYT?

@vponomaryov
Copy link
Contributor Author

IIUC currently, still user needs to watch how many CPUs loaders have.

It is only about efficiency. And it is very easy. Instance type is known, CPU core count is known, do (n - 1) for high rate commands.

for me, the ultimate solution would be to introduce 'latte_cs_thread' that could parse prefixed by latte c-s command like: latte <full cassandra stress command>

It will limit all the flexibility latte provides.

This way, the thread could take proper rune scripts,

Each separate rune script has unique set of parameters for schema creation and for queries.

translate c-s command to latte one and we wouldn't have to learn new syntax for running tests with latte (when need c-s like workload). WDYT?

Parsing of CS command into latte is not really good idea for performance scenarios.
It would be ok only for some load in longevities.

And, anyway, potential feature of parsing CS into latte should be a separate PR.
It should not be a blocker for this one.

@vponomaryov vponomaryov force-pushed the support-latte-for-gradual-perf-tests branch from 8802a35 to 76f3804 Compare May 16, 2025 12:56
Gradual throughput test uses several substitution variables
and 2 of them, called "$threads" and "$throttle", are not compatible "as-is"
with the latte benchmarking tool.

First, Latte has "--threads=N" and "--concurrency=M" parameters
whereas cassandra-stress has only "--threads".
So, latte's values must be multiplied to get CS's value.

Second, latte uses "--rate=100" not "fixed=100/s" or "throttle=100s"
like CS does.

Knowing above, add appropriate parsing of the latte command to support
existing substitution values.

Example:

Following CS command:
  cassandra-stress mixed cl=QUORUM duration=$duration -mode cql3 native \
    -rate 'threads=$threads $throttle' \
    -col 'size=FIXED(1024) n=FIXED(1)' -pop seq=15000001..20000000

Can be simulated by the following Latte command:
  latte run --function=write,read --tag=mixed --sampling=10s --duration=$duration \
  $throttle --threads=7 --concurrency=$threads --consistency=QUORUM
  --start-cycle 15000001 --end-cycle 20000000 \
  -P key_size=10 -P column_size=1024 -P column_count=1 -P row_count=20000000
  data_dir/latte/latte_cs_alike.rn

In case of Latte we specify "--threads" explicitly as (N-1) value
where N is number of CPU cores on loader nodes.
@vponomaryov vponomaryov force-pushed the support-latte-for-gradual-perf-tests branch from 76f3804 to f3ff8bd Compare May 26, 2025 13:15
@vponomaryov
Copy link
Contributor Author

@fruch , @soyacz
So, what do we do with this?

@fruch
Copy link
Contributor

fruch commented May 27, 2025

@fruch , @soyacz
So, what do we do with this?

I'm not sure we want this computation

We will need to experiment with those parameters, and set them optimally for each workload, 1:1 mapping to the thread numbers in c-s isn't exactly what is needed.

@soyacz
Copy link
Contributor

soyacz commented May 28, 2025

Sorry for the late reply,

for me, the ultimate solution would be to introduce 'latte_cs_thread' that could parse prefixed by latte c-s command like: latte <full cassandra stress command>

It will limit all the flexibility latte provides.

I'm not saying to replace LatteStressThread, only add another wrapper just for c-s compatibility at SCT level. Still could use current approach for complex cases.

This way, the thread could take proper rune scripts,

Each separate rune script has unique set of parameters for schema creation and for queries.

That's the hard part (at the beginning) and this could alleviate it. People don't know how things work in latte and need some learning to use it properly.

translate c-s command to latte one and we wouldn't have to learn new syntax for running tests with latte (when need c-s like workload). WDYT?

Parsing of CS command into latte is not really good idea for performance scenarios. It would be ok only for some load in longevities.

some load is actually what we do in most of the cases.

@vponomaryov
Copy link
Contributor Author

@fruch , @soyacz
So, what do we do with this?

I'm not sure we want this computation

We will need to experiment with those parameters, and set them optimally for each workload, 1:1 mapping to the thread numbers in c-s isn't exactly what is needed.

This computation is made just to keep compatibility with existing interfaces for configuration.
Then, the number of latte worker threads is configured explicitly, this is already huge diff.

Then, we can update the number of CS threads which get considered in calculation for latte configurations.
So, I see only gains here:

  • The interface compatibility is kept
  • We still can configure any number of latte workers/threads and concurrency for it

for me, the ultimate solution would be to introduce 'latte_cs_thread' that could parse prefixed by latte c-s command like: latte <full cassandra stress command>

It will limit all the flexibility latte provides.

I'm not saying to replace LatteStressThread, only add another wrapper just for c-s compatibility at SCT level. Still could use current approach for complex cases.

I don't mind to have such a wrapper, just not in scope of this PR.

This way, the thread could take proper rune scripts,

Each separate rune script has unique set of parameters for schema creation and for queries.

That's the hard part (at the beginning) and this could alleviate it. People don't know how things work in latte and need some learning to use it properly.

translate c-s command to latte one and we wouldn't have to learn new syntax for running tests with latte (when need c-s like workload). WDYT?

Parsing of CS command into latte is not really good idea for performance scenarios. It would be ok only for some load in longevities.

some load is actually what we do in most of the cases.

As said above, I am ok to have just in a separate task/PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants