Commit 157c141
[Security Solution] Fix flaky rule telemetry tests (elastic#265100)
**Resolves: elastic#264580
**Resolves: elastic#264491
**Resolves: elastic#263901
**Resolves: elastic#261273
🟢 **Flaky test runner**: 3x200 runs
([1](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/11827),
[2](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/11828),
[3](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/11829))
## Summary
This PR fixes flaky tests for detection rules telemetry collection.
Tests were failing because they expected "indexing_duration" to be above
1.
```
Error: expected 0 to be above 1
```
## What's happening in tests
How it all works:
These tests seed some source events, then create a rule that runs on
them and generates "execution-metrics" events. Then the test calls the
telemetry API endpoint, which responds with telemetry data built from
these metrics (computes min/max/avg values). Then the test asserts on
values from response.
## Why tests were flaky
Tests are flaky because one of two things happen:
- either metrics data from rule execution is not fully available in ES
yet when collector reads it (hard to reproduce, but looking at the code
it's very possible)
- or indexing duration for a rule is <1.5ms which gets rounded with
`Math.round` to either 1 or 0, which fails the test (easy to reproduce
locally)
## Changes
- Updated tests to expect duration values >0ms instead of >1ms, which
makes all the sense looking at the implementation. There really
shouldn't be a special case for 1ms.
- Also updated tests to wait for "execution-metrics" events to appear
before calling the API endpoint to collect data for assertions.
- Removed unnecessary duplicated assertions from tests.
- Updated rule intervals in tests to "1d" to avoid scheduling rule runs
a bunch of times per test – we need only one.
- Using `Math.ceil` rounding instead of `Math.round` when writing
metrics to event log. This means 0ms stay 0ms, 0.1ms becomes 1ms, 1ms
stays 1ms. This prevents a situation where duration for metric was >0
but <0.5ms, but we write 0 to event log and then can't understand
whether metric was collected at all.
---------
Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>1 parent 0f2b67d commit 157c141
2 files changed
Lines changed: 82 additions & 159 deletions
File tree
- x-pack/solutions/security
- plugins/security_solution/server/lib/detection_engine/rule_types
- test/security_solution_api_integration/test_suites/detections_response/telemetry/trial_license_complete_tier/usage_collector
Lines changed: 4 additions & 3 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
482 | 482 | | |
483 | 483 | | |
484 | 484 | | |
| 485 | + | |
485 | 486 | | |
486 | 487 | | |
487 | 488 | | |
488 | | - | |
| 489 | + | |
489 | 490 | | |
490 | 491 | | |
491 | 492 | | |
492 | | - | |
| 493 | + | |
493 | 494 | | |
494 | 495 | | |
495 | 496 | | |
496 | | - | |
| 497 | + | |
497 | 498 | | |
498 | 499 | | |
499 | 500 | | |
| |||
0 commit comments