refactor(elastic): remove all elasticsearch related code #10444

fruch · 2025-03-19T20:25:08Z

since we are now sending performence stats into Argus, we don't need anything to be sending data to ElasticSearch anymore

Testing

🕐 https://jenkins.scylladb.com/job/scylla-staging/job/fruch/job/longevity-100gb-4h-test/106/

PR pre-checks (self review)

I added the relevant backport labels
I didn't leave commented-out/debugging code

Reminders

Add New configuration option and document them (in sdcm/sct_config.py)
Add unit tests to cover my changes (under unit-test/ folder)
Update the Readme/doc folder relevant to this change (if needed)

fruch · 2025-03-19T22:04:37Z

seems like I've forgot about adaptive_timeouts usage of ES.

@soyacz, you think we can switch it to save those in argus ?

soyacz · 2025-03-20T08:28:03Z

seems like I've forgot about adaptive_timeouts usage of ES.

@soyacz, you think we can switch it to save those in argus ?

I propose to send as 'results':

Each operation will have own table (name could be e.g. Adaptive Timeout - decommission)
Each operation execution will get new row in that table

For some tests we could even introduce validation rules based on 'best' - so it would really serve it's initial purpose. (This should play well in individual nemesis tests)

For now, I think you can just change default store to AdaptiveTimeoutStore class here:

def adaptive_timeout(operation: Operations, node: "BaseNode",  # noqa: F821
                     stats_storage: AdaptiveTimeoutStore = ESAdaptiveTimeoutStore(), **kwargs):

It's noop, and ArgusStore can be developed separately.

fruch · 2025-03-20T08:36:05Z

seems like I've forgot about adaptive_timeouts usage of ES.
@soyacz, you think we can switch it to save those in argus ?

I propose to send as 'results':

Each operation will have own table (name could be e.g. Adaptive Timeout - decommission)

Each operation execution will get new row in that table

For some tests we could even introduce validation rules based on 'best' - so it would really serve it's initial purpose. (This should play well in individual nemesis tests)

For now, I think you can just change default store to AdaptiveTimeoutStore class here:
def adaptive_timeout(operation: Operations, node: "BaseNode",  # noqa: F821
                     stats_storage: AdaptiveTimeoutStore = ESAdaptiveTimeoutStore(), **kwargs):
It's noop, and ArgusStore can be developed separately.

we also would need ability to read the results back at some point

soyacz · 2025-03-20T08:38:52Z

seems like I've forgot about adaptive_timeouts usage of ES.
@soyacz, you think we can switch it to save those in argus ?

I propose to send as 'results':

Each operation will have own table (name could be e.g. Adaptive Timeout - decommission)

Each operation execution will get new row in that table

For some tests we could even introduce validation rules based on 'best' - so it would really serve it's initial purpose. (This should play well in individual nemesis tests)
For now, I think you can just change default store to AdaptiveTimeoutStore class here:
def adaptive_timeout(operation: Operations, node: "BaseNode",  # noqa: F821
                     stats_storage: AdaptiveTimeoutStore = ESAdaptiveTimeoutStore(), **kwargs):
It's noop, and ArgusStore can be developed separately.
we also would need ability to read the results back at some point

Maybe, currently for raising red flag if we exceed some timeout - validation rules in Argus will do the job (but first let's see how stable these durations are in Argus).

since we are now sending performence stats into Argus, we don't need anything to be sending data to ElasticSearch anymore

since we are ditching EleasticSearch, this commit introduces a way to report the duration and timeouts used in operations this only implements the option to send out the data, we don't yet have a client API to retrive it from Argus.

scylladbbot · 2025-04-01T11:00:13Z

@fruch new branch manager-3.5 was added, please add backport label if needed

github-actions bot assigned fruch Mar 19, 2025

fruch force-pushed the remove_es_from_code branch from 1b4c62e to 333cdc8 Compare March 19, 2025 20:50

fruch added New Hydra Version PR# introduces new Hydra version backport/none Backport is not required labels Mar 19, 2025

fruch force-pushed the remove_es_from_code branch 3 times, most recently from 83eb036 to 7964a32 Compare March 23, 2025 18:57

fruch added the test-integration Enable running the integration tests suite label Mar 24, 2025

fruch force-pushed the remove_es_from_code branch 4 times, most recently from e7446fb to 4eb6fb5 Compare March 24, 2025 23:11

fruch and others added 3 commits March 25, 2025 22:46

refactor(elastic): remove all elasticsearch related code

6979a25

since we are now sending performence stats into Argus, we don't need anything to be sending data to ElasticSearch anymore

feature(adaptive_timeouts): report results to Argus

1cbf525

since we are ditching EleasticSearch, this commit introduces a way to report the duration and timeouts used in operations this only implements the option to send out the data, we don't yet have a client API to retrive it from Argus.

chore(hydra): create image 1.95-PR10444-bdc78f3

eb924b8

fruch force-pushed the remove_es_from_code branch from a6bdd6c to cbd8b97 Compare March 25, 2025 21:59

fix(add-debug): understand why cyclic dep is coming from

3e34526

fruch force-pushed the remove_es_from_code branch from 3bf15d4 to 3e34526 Compare March 26, 2025 07:35

chore(hydra): create image 1.96-PR10444-b272a5a

cae9b68

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(elastic): remove all elasticsearch related code #10444

refactor(elastic): remove all elasticsearch related code #10444

fruch commented Mar 19, 2025 •

edited

Loading

fruch commented Mar 19, 2025

soyacz commented Mar 20, 2025

fruch commented Mar 20, 2025

soyacz commented Mar 20, 2025

scylladbbot commented Apr 1, 2025

refactor(elastic): remove all elasticsearch related code #10444

Are you sure you want to change the base?

refactor(elastic): remove all elasticsearch related code #10444

Conversation

fruch commented Mar 19, 2025 • edited Loading

Testing

PR pre-checks (self review)

Reminders

fruch commented Mar 19, 2025

soyacz commented Mar 20, 2025

fruch commented Mar 20, 2025

soyacz commented Mar 20, 2025

scylladbbot commented Apr 1, 2025

fruch commented Mar 19, 2025 •

edited

Loading