Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(elastic): remove all elasticsearch related code #10444

Draft
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

fruch
Copy link
Contributor

@fruch fruch commented Mar 19, 2025

since we are now sending performence stats into Argus, we don't need anything to be sending data to ElasticSearch anymore

Testing

PR pre-checks (self review)

  • I added the relevant backport labels
  • I didn't leave commented-out/debugging code

Reminders

  • Add New configuration option and document them (in sdcm/sct_config.py)
  • Add unit tests to cover my changes (under unit-test/ folder)
  • Update the Readme/doc folder relevant to this change (if needed)

@fruch fruch force-pushed the remove_es_from_code branch from 1b4c62e to 333cdc8 Compare March 19, 2025 20:50
@fruch fruch added New Hydra Version PR# introduces new Hydra version backport/none Backport is not required labels Mar 19, 2025
@fruch
Copy link
Contributor Author

fruch commented Mar 19, 2025

seems like I've forgot about adaptive_timeouts usage of ES.

@soyacz, you think we can switch it to save those in argus ?

@soyacz
Copy link
Contributor

soyacz commented Mar 20, 2025

seems like I've forgot about adaptive_timeouts usage of ES.

@soyacz, you think we can switch it to save those in argus ?

I propose to send as 'results':

  1. Each operation will have own table (name could be e.g. Adaptive Timeout - decommission)
  2. Each operation execution will get new row in that table

For some tests we could even introduce validation rules based on 'best' - so it would really serve it's initial purpose. (This should play well in individual nemesis tests)

For now, I think you can just change default store to AdaptiveTimeoutStore class here:

def adaptive_timeout(operation: Operations, node: "BaseNode",  # noqa: F821
                     stats_storage: AdaptiveTimeoutStore = ESAdaptiveTimeoutStore(), **kwargs):

It's noop, and ArgusStore can be developed separately.

@fruch
Copy link
Contributor Author

fruch commented Mar 20, 2025

seems like I've forgot about adaptive_timeouts usage of ES.
@soyacz, you think we can switch it to save those in argus ?

I propose to send as 'results':

  1. Each operation will have own table (name could be e.g. Adaptive Timeout - decommission)
  2. Each operation execution will get new row in that table

For some tests we could even introduce validation rules based on 'best' - so it would really serve it's initial purpose. (This should play well in individual nemesis tests)

For now, I think you can just change default store to AdaptiveTimeoutStore class here:

def adaptive_timeout(operation: Operations, node: "BaseNode",  # noqa: F821
                     stats_storage: AdaptiveTimeoutStore = ESAdaptiveTimeoutStore(), **kwargs):

It's noop, and ArgusStore can be developed separately.

we also would need ability to read the results back at some point

@soyacz
Copy link
Contributor

soyacz commented Mar 20, 2025

seems like I've forgot about adaptive_timeouts usage of ES.
@soyacz, you think we can switch it to save those in argus ?

I propose to send as 'results':

  1. Each operation will have own table (name could be e.g. Adaptive Timeout - decommission)
  2. Each operation execution will get new row in that table

For some tests we could even introduce validation rules based on 'best' - so it would really serve it's initial purpose. (This should play well in individual nemesis tests)
For now, I think you can just change default store to AdaptiveTimeoutStore class here:

def adaptive_timeout(operation: Operations, node: "BaseNode",  # noqa: F821
                     stats_storage: AdaptiveTimeoutStore = ESAdaptiveTimeoutStore(), **kwargs):

It's noop, and ArgusStore can be developed separately.

we also would need ability to read the results back at some point

Maybe, currently for raising red flag if we exceed some timeout - validation rules in Argus will do the job (but first let's see how stable these durations are in Argus).

@fruch fruch force-pushed the remove_es_from_code branch 3 times, most recently from 83eb036 to 7964a32 Compare March 23, 2025 18:57
@fruch fruch added the test-integration Enable running the integration tests suite label Mar 24, 2025
@fruch fruch force-pushed the remove_es_from_code branch 4 times, most recently from e7446fb to 4eb6fb5 Compare March 24, 2025 23:11
fruch and others added 3 commits March 25, 2025 22:46
since we are now sending performence stats into Argus,
we don't need anything to be sending data to ElasticSearch
anymore
since we are ditching EleasticSearch, this commit introduces
a way to report the duration and timeouts used in operations

this only implements the option to send out the data,
we don't yet have a client API to retrive it from Argus.
@fruch fruch force-pushed the remove_es_from_code branch from a6bdd6c to cbd8b97 Compare March 25, 2025 21:59
@fruch fruch force-pushed the remove_es_from_code branch from 3bf15d4 to 3e34526 Compare March 26, 2025 07:35
@scylladbbot
Copy link

@fruch new branch manager-3.5 was added, please add backport label if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport/none Backport is not required New Hydra Version PR# introduces new Hydra version test-integration Enable running the integration tests suite
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants