Skip to content

Commit 157c141

Browse files
[Security Solution] Fix flaky rule telemetry tests (elastic#265100)
**Resolves: elastic#264580 **Resolves: elastic#264491 **Resolves: elastic#263901 **Resolves: elastic#261273 🟢 **Flaky test runner**: 3x200 runs ([1](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/11827), [2](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/11828), [3](https://buildkite.com/elastic/kibana-flaky-test-suite-runner/builds/11829)) ## Summary This PR fixes flaky tests for detection rules telemetry collection. Tests were failing because they expected "indexing_duration" to be above 1. ``` Error: expected 0 to be above 1 ``` ## What's happening in tests How it all works: These tests seed some source events, then create a rule that runs on them and generates "execution-metrics" events. Then the test calls the telemetry API endpoint, which responds with telemetry data built from these metrics (computes min/max/avg values). Then the test asserts on values from response. ## Why tests were flaky Tests are flaky because one of two things happen: - either metrics data from rule execution is not fully available in ES yet when collector reads it (hard to reproduce, but looking at the code it's very possible) - or indexing duration for a rule is <1.5ms which gets rounded with `Math.round` to either 1 or 0, which fails the test (easy to reproduce locally) ## Changes - Updated tests to expect duration values >0ms instead of >1ms, which makes all the sense looking at the implementation. There really shouldn't be a special case for 1ms. - Also updated tests to wait for "execution-metrics" events to appear before calling the API endpoint to collect data for assertions. - Removed unnecessary duplicated assertions from tests. - Updated rule intervals in tests to "1d" to avoid scheduling rule runs a bunch of times per test – we need only one. - Using `Math.ceil` rounding instead of `Math.round` when writing metrics to event log. This means 0ms stay 0ms, 0.1ms becomes 1ms, 1ms stays 1ms. This prevents a situation where duration for metric was >0 but <0.5ms, but we write 0 to event log and then can't understand whether metric was collected at all. --------- Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
1 parent 0f2b67d commit 157c141

2 files changed

Lines changed: 82 additions & 159 deletions

File tree

x-pack/solutions/security/plugins/security_solution/server/lib/detection_engine/rule_types/create_security_rule_type_wrapper.ts

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -482,18 +482,19 @@ export const createSecurityRuleTypeWrapper: CreateSecurityRuleTypeWrapper =
482482
);
483483
const suppressedAlertsCount = result.suppressedAlertsCount ?? 0;
484484

485+
// Using Math.ceil() to prevent the event log from showing 0ms for sub-millisecond durations.
485486
ruleExecutionLogger.logMetrics({
486487
total_search_duration_ms:
487488
result.searchAfterTimes.length > 0
488-
? Math.round(sum(result.searchAfterTimes.map(Number)))
489+
? Math.ceil(sum(result.searchAfterTimes.map(Number)))
489490
: undefined,
490491
total_indexing_duration_ms:
491492
result.bulkCreateTimes.length > 0
492-
? Math.round(sum(result.bulkCreateTimes.map(Number)))
493+
? Math.ceil(sum(result.bulkCreateTimes.map(Number)))
493494
: undefined,
494495
total_enrichment_duration_ms:
495496
result.enrichmentTimes.length > 0
496-
? Math.round(sum(result.enrichmentTimes.map(Number)))
497+
? Math.ceil(sum(result.enrichmentTimes.map(Number)))
497498
: undefined,
498499
frozen_indices_queried_count: frozenIndicesQueriedCount,
499500
alerts_candidate_count: result.alertsCandidateCount,

0 commit comments

Comments
 (0)