Skip to content

[Security Solution][Alerting] Add rulesClient.bulkCreate(), with feedback from ResponseOps#269340

Draft
sdesalas wants to merge 7 commits into
elastic:mainfrom
sdesalas:bulk-create-enable-alert-rules-feedback
Draft

[Security Solution][Alerting] Add rulesClient.bulkCreate(), with feedback from ResponseOps#269340
sdesalas wants to merge 7 commits into
elastic:mainfrom
sdesalas:bulk-create-enable-alert-rules-feedback

Conversation

@sdesalas
Copy link
Copy Markdown
Member

@sdesalas sdesalas commented May 14, 2026

Resolves: #264893
Relates to: #264890 (epic)
Relates to: #271722 (similar draft PR with security-solution wiring)

Summary

Adds rulesClient.bulkCreateRules() to the Alerting framework.

The method natively handles both disabled and enabled rules in a single call (validation, per-rule API-key minting, taskManager.bulkSchedule, demotion-on-failure, audit, SO writes, post-failure cleanup). Optimizing for code simplicity, speed of rule creation and memory footprint.

Feedback from ResponseOps

This round (6 out of 6) incorporates the following feedback from @elastic/response-ops team relayed by @cnasikas:

  1. Batching to be done INSIDE rulesClient.bulkCreateRules()
  2. Supports exit-early on error.
  3. Ordering: Create the tasks as enabled, and then the rules as enabled. No extra step to enable at the end.
  4. Do not transform and return whole rule (memory footgun). Just return the rule id.
  5. Rule prep (validation, api key generation) is done up-front before ES writes.
  6. Best effort clean up of resources (TM deletes, API key invalidation)
  7. Configurable batchSize and maxLimit, with internal hard limit for number of rules passed in.
  8. Deliberately "light" on testing to make sure further feedback can be incorporated easily.

CLARIFICATION ON 5. Rule prep

before doing any API calls to ES (like API generation, rule creation, etc), let's do first the operations that can fail fast like schema validation. Then, when all checks have passed, we can go and start doing the ES calls. This will help us with doing as few reverts as possible.

Additional feedback - implemented on a separate PR:

  1. TaskManage.bulkSchedule() should implement the same logic as bulkEnable() to avoid overlapping tasks (randomly assigning them over next 5 minutes).

Feedback from Response Ops not implemented:

  • Configurable maxLimit: Implemented hard limit of 10K rules (with pre-existing const MAX_RULES_NUMBER_FOR_BULK_OPERATION). It doesn't make sense for consumers to provide a maxLimit, since they are passing an array whose size is known so they can simply check it and return 403/404.

How the code works

The flow is split into two phases — A (in-memory and whole-batch pre checks, fast) and B (batched ES writes, slow), each instrumented with an APM span.

Bulk insertion is "best-effort", unless exitEarlyOnError is used.

  • exitEarlyOnError: true: short-circuits after A (zero ES writes) and after any B4 failure.
  • exitEarlyOnError: false/undefined: drops invalid rules from A, demotes invalid rules in B (inserts as disabled)
Phase APM Span What it does
A1 preValidate.checkInMemory Per-rule schema / rule-type / params / interval checks. Sequential, fail-fast, no ES.
A2 preValidate.ensureAuthorized Deduped authz per (ruleTypeId, consumer) pair. Failures are audited and dropped pre-batch.
B1 runBatch.pMap.prepareRule Per-rule prepare: action validation + uuid, API-key minting. Concurrent (API_KEY_GENERATE_CONCURRENCY).
B2 runBatch.validateScheduleLimit Circuit-breaker check on the enabled subset; demote-to-disabled on overflow.
B3 runBatch.bulkSchedule taskManager.bulkSchedule for survivors; demote on whole-call throw or silent per-task drop.
B4 runBatch.bulkCreateRulesSo Bulk SO create. Per-row 409s surface as errors; failures trigger TM + API-key cleanup.

How to test this code

The changes in this PR do not have any actual wiring that allows them to be tested.

However. You can check out the branch in this pull request 👉 #271722. It contains the exact same changes as this branch with an additional commit 9c80944 that provides wiring through the Security > Detection Rules.

gh pr checkout 271722

Note that the wiring is also available as a patch you can apply locally.

gh pr checkout 269340 && git apply wiring-for-bulk-create-and-changes-history.patch

Enabling security solution wiring via kibana.dev.yml

Make sure that the feature flag is turned on so we can route security solution requests through the new rules client bulkCreateRules() method.

# kibana.dev.yml 
xpack.securitySolution.enableExperimental:
- 'bulkCreateRulesEnabled'

Prebuilt Rule Installation (disabled rules)

For testing creation of disabled rules, the best approach is to install the whole set (1850) of "prebuilt" detection rules.

Security > Detection Rules (SIEM) > Add Elastic Rules > Install All

Make sure you add some debug breakpoints if you want to walk through the code.

If you try it normally on your local machine, you should notice a significant speed improvement (8-9x) vs main.

image image

Bulk deletion (so you can run another batch)

Select all 1850 rules > Bulk actions > Delete

image image

Rule Import (enabled rules)

Enabled rules are trickier to test, but can be done via rule import process.

Security > Detection Rules (SIEM) > Import Rules

👉 Some 📁 SAMPLE DATA.

Note: there is currently a 10MB upload threshold for rule imports, which equates roughly to 1000 rules.

image image image

Verify task creation

You can verify the creation of tasks locally with check-tasks.sh

$ ./check-tasks.sh

Or if you are using ports different to the standard 5601 and 9200

$ KIBANA_DEV_PORT=5606 ES_DEV_PORT=9205 ./check-tasks.sh

If you just imported 1000 enabled tasks, counts should match:

starting..
KIBANA_URL=http://localhost:5601/kbn
ES_URL=http://localhost:9200
1. 2. 3. 4. 5. 6.
rules:           1000
rules_enabled:   1000
tasks:           1000
tasks_enabled:   1000
api_key_owner:   1000
apiKey present:  1000

Memory usage

No significant difference in memory usage, within error margin, if anything the new changes appear to reduce usage slightly, even though we're inserting in batches of 100 at a time.

image

Identify risks

Low to none at present. Mid to high later when wired into Security solution.

Worth mentioning:

  1. New method. No production impact. bulkCreateRules() is not called by anything yet, existing flows are unaffected.
  2. "Best-effort" rule demotion (to enabled:false) changes caller contract. If we cant add scheduling for a rule enabled: true it is flipped to enabled: false returning errors entry. This is unexpected and different to existing behavior.
  3. Cleanup is best-effort, not transactional. Cleanup relies on later calls actually making it to ES. On B4 failure for example, TM bulkRemove and API-key invalidation require an active connection, if it does not exist we could leave dangling tasks and API keys.

Mitigating circumstances:

This functionality will be gated behind a flag when released into Security solution. As per bulkCreateRulesEnabled in commit 9c80944. To ensure proper testing takes place before making available to users.

@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

infra-vault-gh-plugin-prod Bot commented May 14, 2026

🤖 Jobs for this PR can be triggered through checkboxes. 🚧

ℹ️ To trigger the CI, please tick the checkbox below 👇

  • Click to trigger kibana-pull-request for this PR!
  • Click to trigger kibana-deploy-project-from-pr for this PR!
  • Click to trigger kibana-deploy-cloud-from-pr for this PR!
  • Click to trigger kibana-entity-store-performance-from-pr for this PR!
  • Click to trigger kibana-storybooks-from-pr for this PR!

@sdesalas sdesalas force-pushed the bulk-create-enable-alert-rules-feedback branch from 2ba963a to 183d47c Compare May 14, 2026 15:24
@sdesalas sdesalas changed the title [Security Solution] Bulk create enable alert rules feedback [Security Solution] Add rulesClient.bulkCreate(), with feedback from ResponseOps May 14, 2026
@sdesalas
Copy link
Copy Markdown
Member Author

/ci

1 similar comment
@sdesalas
Copy link
Copy Markdown
Member Author

/ci

sdesalas added a commit that referenced this pull request May 21, 2026
…69991)

**Resolves: #195136**
**Related to: #264893**
**Related to: #269340**

## Summary

`TaskScheduling.bulkSchedule` previously sent every task to the store
with no `runAt`, which caused the store to default them all to "now".
When a caller bulk-scheduled many recurring tasks at once, the polling
queue was flooded with [simultaneous
claims](https://en.wikipedia.org/wiki/Thundering_herd_problem).

This PR brings `bulkSchedule` in line with `bulkEnable` (see #172742):
the first task in the batch still runs immediately, but subsequent
enabled recurring tasks are scheduled with a randomized `runAt`, evenly
distributed up to 5 minutes in the future. Ad-hoc tasks (no
`schedule.interval`) and disabled tasks are left untouched and run
immediately as before.

This also helps unblock upcoming work on `RulesClient.bulkCreate`
(#264893), where a single API call may schedule a large number of
detection-rule tasks at once.

## Whats included

- The existing `randomlyOffsetRunTimestamp()` helper is replaced by a
smaller pure helper `addJitter()` that returns `{ runAt, scheduledAt }`
- or `undefined` when no interval is supplied - so callers control the
spread.
- `bulkSchedule` map callback now receives the index `i` and uses
`addJitter()` when `enabled && i > 0`.
- `bulkEnable`'s behavior continues exactly the same with the `i > 0`
branch using the shared `addJitter()` helper.

## How to test

> [!IMPORTANT]
> There is no easy way to test TM `bulkSchedule()` randomizing without
`v2` alerting. Because of this, the test below only covers task
randomizing under `bulkEnable()`. If you apply debugging here, you will
notice that enabling detection rules in bulk uses `bulkSchedule()` under
the hood, but it does so in `enabled: false` state. In other words, no
jitter will get applied until the following pass. This is expected, what
the test below does primarily is to verify that existing behavior is
unaffected. The changes to behavior in `bulkSchedule(enabled:true)` will
become more meaningful with upcoming work on alerting `v2` and
`RulesClient.bulkCreate`.

1. Start ES + Kibana from this branch. Make sure you have a clean ES
with no rules.

2. In Kibana, navigate to **Security → Rules → Detection rules (SIEM)**
and click **Add Elastic Rules** to install the prebuilt detection rule
set (~1850 rules). Leave them disabled.

> Note: This is a good time to place some breakpoints if you're
debugging locally.

3. Go back to the rules management screen. Under "Installed Rules" click
the checkbox to select first 20 rules then `Bulk actions` > `Enable`.
You should see a message saying "Successfully enabled 20 rules"

4. Verify the `runAt` / `scheduledAt` distribution using
[`check-task-runtime.sh`](https://github.com/sdesalas/kibana-knowledge/blob/main/scripts/check-task-runtime.sh):

```bash
$ ./check-task-runtime.sh
```
Or if you are using ports different to the standard `5601` and `9200`

```bash
$ KIBANA_DEV_PORT=5606 ES_DEV_PORT=9205 ./check-task-runtime.sh
```

5. Expected output: counts match, and the first-20 task timestamps are
**spread across several minutes** rather than all stamped with the same
"now":

```
starting..
KIBANA_URL=http://localhost:5601/kbn
ES_URL=http://localhost:9200
1. 2. 3. 4. 5. 6.
rules:           1850
rules_enabled:   20
tasks:           20
tasks_enabled:   20
api_key_owner:   20
apiKey present:  20

first 20 tasks:
taskType                  status  enabled  runAt                     scheduledAt
alerting:siem.queryRule   idle    true     2026-05-19T16:20:08.323Z  2026-05-19T16:20:08.323Z
alerting:siem.queryRule   idle    true     2026-05-19T16:21:28.689Z  2026-05-19T16:21:28.689Z
alerting:siem.eqlRule     idle    true     2026-05-19T16:21:05.927Z  2026-05-19T16:21:05.927Z
alerting:siem.queryRule   idle    true     2026-05-19T16:20:53.163Z  2026-05-19T16:20:53.163Z
alerting:siem.queryRule   idle    true     2026-05-19T16:23:30.562Z  2026-05-19T16:23:30.562Z
alerting:siem.esqlRule    idle    true     2026-05-19T16:23:45.295Z  2026-05-19T16:23:45.295Z
...
```

For every task, `runAt` should equal its matching `scheduledAt`. The
timestamps should be distributed across the configured jitter window
(`min(rule interval, 5m)`) - confirming jitter is applied per-task. On
`main` without this PR, every task's `runAt` collapses to the same
value.

## Callers of `bulkSchedule`

For reference, the production callers of `TaskScheduling.bulkSchedule`
and whether they exercise the new jitter:

| Caller | What it schedules | Hits the new jitter? |
|---|---|---|
|
[`alerting_v2/.../rules_client.bulkEnableRules`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting_v2/server/lib/rules_client/rules_client.ts)
| enabled, recurring (`schedule.interval`, `enabled: true`) | **Yes** —
primary path exercised by the manual test above |
|
[`alerting/.../bulk_enable_rules.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/application/rule/methods/bulk_enable/bulk_enable_rules.ts)
(legacy) | recurring but `enabled: false` — the comment in that file
explicitly says "we create the task as disabled, taskManager.bulkEnable
will enable them by randomising their schedule datetime" | No (jitter
applied later by `bulkEnable`) |
|
[`workflows_execution_engine/server/plugin.ts`](https://github.com/elastic/kibana/blob/main/src/platform/plugins/shared/workflows_execution_engine/server/plugin.ts)
| enabled but ad-hoc (no `schedule`) | No (ad-hoc — runs immediately,
correct) |
|
[`actions/create_execute_function.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/actions/server/create_execute_function.ts)
| ad-hoc action tasks | No |
|
[`actions/create_unsecured_execute_function.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/actions/server/create_unsecured_execute_function.ts)
| ad-hoc action tasks | No |
|
[`alerting/.../backfill_client.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/backfill_client/backfill_client.ts)
| variable is literally `adHocTasksToSchedule` | No |

Of these, only the alerting_v2 `bulkEnableRules` path schedules enabled
recurring tasks in bulk, so it is the only caller whose runtime behavior
changes with this PR.

## Release note

skip

## Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### Identify risks

- Behavior change is scoped to recurring tasks at `i > 0`; single-task
`bulkSchedule` calls and ad-hoc tasks retain the existing "run now"
semantics.
- The `bulkEnable` path is unchanged in semantics; only the helper
signature changed.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
paulinashakirova pushed a commit to paulinashakirova/kibana that referenced this pull request May 22, 2026
…astic#269991)

**Resolves: elastic#195136**
**Related to: elastic#264893**
**Related to: elastic#269340**

## Summary

`TaskScheduling.bulkSchedule` previously sent every task to the store
with no `runAt`, which caused the store to default them all to "now".
When a caller bulk-scheduled many recurring tasks at once, the polling
queue was flooded with [simultaneous
claims](https://en.wikipedia.org/wiki/Thundering_herd_problem).

This PR brings `bulkSchedule` in line with `bulkEnable` (see elastic#172742):
the first task in the batch still runs immediately, but subsequent
enabled recurring tasks are scheduled with a randomized `runAt`, evenly
distributed up to 5 minutes in the future. Ad-hoc tasks (no
`schedule.interval`) and disabled tasks are left untouched and run
immediately as before.

This also helps unblock upcoming work on `RulesClient.bulkCreate`
(elastic#264893), where a single API call may schedule a large number of
detection-rule tasks at once.

## Whats included

- The existing `randomlyOffsetRunTimestamp()` helper is replaced by a
smaller pure helper `addJitter()` that returns `{ runAt, scheduledAt }`
- or `undefined` when no interval is supplied - so callers control the
spread.
- `bulkSchedule` map callback now receives the index `i` and uses
`addJitter()` when `enabled && i > 0`.
- `bulkEnable`'s behavior continues exactly the same with the `i > 0`
branch using the shared `addJitter()` helper.

## How to test

> [!IMPORTANT]
> There is no easy way to test TM `bulkSchedule()` randomizing without
`v2` alerting. Because of this, the test below only covers task
randomizing under `bulkEnable()`. If you apply debugging here, you will
notice that enabling detection rules in bulk uses `bulkSchedule()` under
the hood, but it does so in `enabled: false` state. In other words, no
jitter will get applied until the following pass. This is expected, what
the test below does primarily is to verify that existing behavior is
unaffected. The changes to behavior in `bulkSchedule(enabled:true)` will
become more meaningful with upcoming work on alerting `v2` and
`RulesClient.bulkCreate`.

1. Start ES + Kibana from this branch. Make sure you have a clean ES
with no rules.

2. In Kibana, navigate to **Security → Rules → Detection rules (SIEM)**
and click **Add Elastic Rules** to install the prebuilt detection rule
set (~1850 rules). Leave them disabled.

> Note: This is a good time to place some breakpoints if you're
debugging locally.

3. Go back to the rules management screen. Under "Installed Rules" click
the checkbox to select first 20 rules then `Bulk actions` > `Enable`.
You should see a message saying "Successfully enabled 20 rules"

4. Verify the `runAt` / `scheduledAt` distribution using
[`check-task-runtime.sh`](https://github.com/sdesalas/kibana-knowledge/blob/main/scripts/check-task-runtime.sh):

```bash
$ ./check-task-runtime.sh
```
Or if you are using ports different to the standard `5601` and `9200`

```bash
$ KIBANA_DEV_PORT=5606 ES_DEV_PORT=9205 ./check-task-runtime.sh
```

5. Expected output: counts match, and the first-20 task timestamps are
**spread across several minutes** rather than all stamped with the same
"now":

```
starting..
KIBANA_URL=http://localhost:5601/kbn
ES_URL=http://localhost:9200
1. 2. 3. 4. 5. 6.
rules:           1850
rules_enabled:   20
tasks:           20
tasks_enabled:   20
api_key_owner:   20
apiKey present:  20

first 20 tasks:
taskType                  status  enabled  runAt                     scheduledAt
alerting:siem.queryRule   idle    true     2026-05-19T16:20:08.323Z  2026-05-19T16:20:08.323Z
alerting:siem.queryRule   idle    true     2026-05-19T16:21:28.689Z  2026-05-19T16:21:28.689Z
alerting:siem.eqlRule     idle    true     2026-05-19T16:21:05.927Z  2026-05-19T16:21:05.927Z
alerting:siem.queryRule   idle    true     2026-05-19T16:20:53.163Z  2026-05-19T16:20:53.163Z
alerting:siem.queryRule   idle    true     2026-05-19T16:23:30.562Z  2026-05-19T16:23:30.562Z
alerting:siem.esqlRule    idle    true     2026-05-19T16:23:45.295Z  2026-05-19T16:23:45.295Z
...
```

For every task, `runAt` should equal its matching `scheduledAt`. The
timestamps should be distributed across the configured jitter window
(`min(rule interval, 5m)`) - confirming jitter is applied per-task. On
`main` without this PR, every task's `runAt` collapses to the same
value.

## Callers of `bulkSchedule`

For reference, the production callers of `TaskScheduling.bulkSchedule`
and whether they exercise the new jitter:

| Caller | What it schedules | Hits the new jitter? |
|---|---|---|
|
[`alerting_v2/.../rules_client.bulkEnableRules`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting_v2/server/lib/rules_client/rules_client.ts)
| enabled, recurring (`schedule.interval`, `enabled: true`) | **Yes** —
primary path exercised by the manual test above |
|
[`alerting/.../bulk_enable_rules.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/application/rule/methods/bulk_enable/bulk_enable_rules.ts)
(legacy) | recurring but `enabled: false` — the comment in that file
explicitly says "we create the task as disabled, taskManager.bulkEnable
will enable them by randomising their schedule datetime" | No (jitter
applied later by `bulkEnable`) |
|
[`workflows_execution_engine/server/plugin.ts`](https://github.com/elastic/kibana/blob/main/src/platform/plugins/shared/workflows_execution_engine/server/plugin.ts)
| enabled but ad-hoc (no `schedule`) | No (ad-hoc — runs immediately,
correct) |
|
[`actions/create_execute_function.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/actions/server/create_execute_function.ts)
| ad-hoc action tasks | No |
|
[`actions/create_unsecured_execute_function.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/actions/server/create_unsecured_execute_function.ts)
| ad-hoc action tasks | No |
|
[`alerting/.../backfill_client.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/backfill_client/backfill_client.ts)
| variable is literally `adHocTasksToSchedule` | No |

Of these, only the alerting_v2 `bulkEnableRules` path schedules enabled
recurring tasks in bulk, so it is the only caller whose runtime behavior
changes with this PR.

## Release note

skip

## Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### Identify risks

- Behavior change is scoped to recurring tasks at `i > 0`; single-task
`bulkSchedule` calls and ad-hoc tasks retain the existing "run now"
semantics.
- The `bulkEnable` path is unchanged in semantics; only the helper
signature changed.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
jcger pushed a commit that referenced this pull request May 26, 2026
…69991)

**Resolves: #195136**
**Related to: #264893**
**Related to: #269340**

## Summary

`TaskScheduling.bulkSchedule` previously sent every task to the store
with no `runAt`, which caused the store to default them all to "now".
When a caller bulk-scheduled many recurring tasks at once, the polling
queue was flooded with [simultaneous
claims](https://en.wikipedia.org/wiki/Thundering_herd_problem).

This PR brings `bulkSchedule` in line with `bulkEnable` (see #172742):
the first task in the batch still runs immediately, but subsequent
enabled recurring tasks are scheduled with a randomized `runAt`, evenly
distributed up to 5 minutes in the future. Ad-hoc tasks (no
`schedule.interval`) and disabled tasks are left untouched and run
immediately as before.

This also helps unblock upcoming work on `RulesClient.bulkCreate`
(#264893), where a single API call may schedule a large number of
detection-rule tasks at once.

## Whats included

- The existing `randomlyOffsetRunTimestamp()` helper is replaced by a
smaller pure helper `addJitter()` that returns `{ runAt, scheduledAt }`
- or `undefined` when no interval is supplied - so callers control the
spread.
- `bulkSchedule` map callback now receives the index `i` and uses
`addJitter()` when `enabled && i > 0`.
- `bulkEnable`'s behavior continues exactly the same with the `i > 0`
branch using the shared `addJitter()` helper.

## How to test

> [!IMPORTANT]
> There is no easy way to test TM `bulkSchedule()` randomizing without
`v2` alerting. Because of this, the test below only covers task
randomizing under `bulkEnable()`. If you apply debugging here, you will
notice that enabling detection rules in bulk uses `bulkSchedule()` under
the hood, but it does so in `enabled: false` state. In other words, no
jitter will get applied until the following pass. This is expected, what
the test below does primarily is to verify that existing behavior is
unaffected. The changes to behavior in `bulkSchedule(enabled:true)` will
become more meaningful with upcoming work on alerting `v2` and
`RulesClient.bulkCreate`.

1. Start ES + Kibana from this branch. Make sure you have a clean ES
with no rules.

2. In Kibana, navigate to **Security → Rules → Detection rules (SIEM)**
and click **Add Elastic Rules** to install the prebuilt detection rule
set (~1850 rules). Leave them disabled.

> Note: This is a good time to place some breakpoints if you're
debugging locally.

3. Go back to the rules management screen. Under "Installed Rules" click
the checkbox to select first 20 rules then `Bulk actions` > `Enable`.
You should see a message saying "Successfully enabled 20 rules"

4. Verify the `runAt` / `scheduledAt` distribution using
[`check-task-runtime.sh`](https://github.com/sdesalas/kibana-knowledge/blob/main/scripts/check-task-runtime.sh):

```bash
$ ./check-task-runtime.sh
```
Or if you are using ports different to the standard `5601` and `9200`

```bash
$ KIBANA_DEV_PORT=5606 ES_DEV_PORT=9205 ./check-task-runtime.sh
```

5. Expected output: counts match, and the first-20 task timestamps are
**spread across several minutes** rather than all stamped with the same
"now":

```
starting..
KIBANA_URL=http://localhost:5601/kbn
ES_URL=http://localhost:9200
1. 2. 3. 4. 5. 6.
rules:           1850
rules_enabled:   20
tasks:           20
tasks_enabled:   20
api_key_owner:   20
apiKey present:  20

first 20 tasks:
taskType                  status  enabled  runAt                     scheduledAt
alerting:siem.queryRule   idle    true     2026-05-19T16:20:08.323Z  2026-05-19T16:20:08.323Z
alerting:siem.queryRule   idle    true     2026-05-19T16:21:28.689Z  2026-05-19T16:21:28.689Z
alerting:siem.eqlRule     idle    true     2026-05-19T16:21:05.927Z  2026-05-19T16:21:05.927Z
alerting:siem.queryRule   idle    true     2026-05-19T16:20:53.163Z  2026-05-19T16:20:53.163Z
alerting:siem.queryRule   idle    true     2026-05-19T16:23:30.562Z  2026-05-19T16:23:30.562Z
alerting:siem.esqlRule    idle    true     2026-05-19T16:23:45.295Z  2026-05-19T16:23:45.295Z
...
```

For every task, `runAt` should equal its matching `scheduledAt`. The
timestamps should be distributed across the configured jitter window
(`min(rule interval, 5m)`) - confirming jitter is applied per-task. On
`main` without this PR, every task's `runAt` collapses to the same
value.

## Callers of `bulkSchedule`

For reference, the production callers of `TaskScheduling.bulkSchedule`
and whether they exercise the new jitter:

| Caller | What it schedules | Hits the new jitter? |
|---|---|---|
|
[`alerting_v2/.../rules_client.bulkEnableRules`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting_v2/server/lib/rules_client/rules_client.ts)
| enabled, recurring (`schedule.interval`, `enabled: true`) | **Yes** —
primary path exercised by the manual test above |
|
[`alerting/.../bulk_enable_rules.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/application/rule/methods/bulk_enable/bulk_enable_rules.ts)
(legacy) | recurring but `enabled: false` — the comment in that file
explicitly says "we create the task as disabled, taskManager.bulkEnable
will enable them by randomising their schedule datetime" | No (jitter
applied later by `bulkEnable`) |
|
[`workflows_execution_engine/server/plugin.ts`](https://github.com/elastic/kibana/blob/main/src/platform/plugins/shared/workflows_execution_engine/server/plugin.ts)
| enabled but ad-hoc (no `schedule`) | No (ad-hoc — runs immediately,
correct) |
|
[`actions/create_execute_function.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/actions/server/create_execute_function.ts)
| ad-hoc action tasks | No |
|
[`actions/create_unsecured_execute_function.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/actions/server/create_unsecured_execute_function.ts)
| ad-hoc action tasks | No |
|
[`alerting/.../backfill_client.ts`](https://github.com/elastic/kibana/blob/main/x-pack/platform/plugins/shared/alerting/server/backfill_client/backfill_client.ts)
| variable is literally `adHocTasksToSchedule` | No |

Of these, only the alerting_v2 `bulkEnableRules` path schedules enabled
recurring tasks in bulk, so it is the only caller whose runtime behavior
changes with this PR.

## Release note

skip

## Checklist

- [x] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

### Identify risks

- Behavior change is scoped to recurring tasks at `i > 0`; single-task
`bulkSchedule` calls and ad-hoc tasks retain the existing "run now"
semantics.
- The `bulkEnable` path is unchanged in semantics; only the helper
signature changed.

---------

Co-authored-by: Cursor <cursoragent@cursor.com>
@sdesalas sdesalas force-pushed the bulk-create-enable-alert-rules-feedback branch from 9c22341 to 15d0646 Compare May 26, 2026 11:19
@sdesalas sdesalas force-pushed the bulk-create-enable-alert-rules-feedback branch from 15d0646 to 3a9fd53 Compare May 28, 2026 11:09
@sdesalas
Copy link
Copy Markdown
Member Author

sdesalas commented May 28, 2026

Applying clarification from @elastic/response-ops team @cnasikas on point 5 "Rule prep" in commit 1fca692

Yes, what I mean is that before doing any API calls to ES (like API generation, rule creation, etc), let's do first the operations that can fail fast like schema validation. Then, when all checks have passed, we can go and start doing the ES calls. This will help us with doing as few reverts as possible.

Also incorporating changes from:

@sdesalas sdesalas self-assigned this May 28, 2026
@sdesalas sdesalas force-pushed the bulk-create-enable-alert-rules-feedback branch 2 times, most recently from 39c6b56 to f916c50 Compare May 28, 2026 11:45
@sdesalas sdesalas force-pushed the bulk-create-enable-alert-rules-feedback branch from dc1d6a6 to 2caf941 Compare May 28, 2026 13:00
@sdesalas
Copy link
Copy Markdown
Member Author

/ci

// Otherwise add jitter to avoid them firing together.
scheduling =
i === 0
arr.length === 1
Copy link
Copy Markdown
Member Author

@sdesalas sdesalas May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☝️ This is a follow-up to work done on this PR:

#269991

More "correct" interpretation of ticket #195136

Definition of Done

  • When bulkSchedule API is called with one task, we make it run now (as it does today)
  • When bulkSchedule API is called with more than one task, we schedule the tasks with a randomized runAt

Ie. Instead of skipping jittering for the first task (which bulkEnable used to do before PR #269991, confusing the intent), we skip it when there is more than one task.

What does this mean in practical terms?

If we have 20 batches of 50 rules, we avoid bulkScheduleing 20 tasks "now" (the first task in each batch), which start firing right away blocking further batches inserted during bulkCreateRules().

Ie: ~10% speed improvement (26s -> 24s on localhost) when bulk creating 1000 enabled rules in batches of 50. Note that this has not verified with a large set of tests, it could be something within margin of error.

@sdesalas sdesalas changed the title [Security Solution] Add rulesClient.bulkCreate(), with feedback from ResponseOps [Security Solution][Alerting] Add rulesClient.bulkCreate(), with feedback from ResponseOps May 28, 2026
@kibanamachine
Copy link
Copy Markdown
Contributor

kibanamachine commented May 29, 2026

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] FTR Configs #221 / Entity Analytics - Watchlists @ess @serverless @skipInServerlessMKI Watchlist Lifecycle should remove all entities from the watchlist index when the watchlist is deleted
  • [job] [logs] FTR Configs #53 / integrations For each artifact list under management When on the Event Filters entries list should be able to update an existing Event Filters entry
  • [job] [logs] FTR Configs #53 / integrations When in the Fleet application and on the Endpoint Integration details page should display the endpoint custom content
  • [job] [logs] Scout Lane #12 - stateful-classic / default / local-stateful-classic - Saved query menu — CRUD (Discover) - save, load, update, save-as-new, delete via the popover

Metrics [docs]

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alerting 20.4KB 20.6KB +209.0B

History

cc @sdesalas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Security Solution] Implement RulesClient.bulkCreate method in the Alerting Framework

2 participants