Skip to content

[Alerting V2] Register "assign episode" workflow trigger#268915

Open
adcoelho wants to merge 26 commits into
elastic:mainfrom
adcoelho:alerting-v2-workflow-trigger-assign-episode
Open

[Alerting V2] Register "assign episode" workflow trigger#268915
adcoelho wants to merge 26 commits into
elastic:mainfrom
adcoelho:alerting-v2-workflow-trigger-assign-episode

Conversation

@adcoelho
Copy link
Copy Markdown
Contributor

@adcoelho adcoelho commented May 12, 2026

Closes #264977

Summary

  • This PR registers a new workflow trigger type for alerting episode assignment.
  • This specific event is emitted by the (bulk) create action APIs when:
    • The action type is assign.
    • The assignee is not null.

Testing

Create a workflow that uses the new trigger:

name: New workflow
description: This is a new workflow
enabled: true

triggers:
  - type: alertingV2.episodeAssigned
    on:
      condition: 'event.ruleId: *'

steps:
  - name: hello_world_step
    type: cases.addComment
    with:
      case_id: "2be77e8a-d26a-4290-b192-b7c9854c92f8"
      comment: "It works!"
  1. Assign an episode to any user.
  2. Confirm in the workflow's execution tab that it ran successfully
  3. Confirm that the event data was populated properly
  4. Test different conditions, steps, and comments.

@adcoelho adcoelho requested a review from umbopepato May 12, 2026 14:11
@adcoelho adcoelho self-assigned this May 12, 2026
@adcoelho adcoelho requested review from a team as code owners May 12, 2026 14:11
@adcoelho adcoelho added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t// labels May 12, 2026
@infra-vault-gh-plugin-prod
Copy link
Copy Markdown

Pinging @elastic/response-ops (Team:ResponseOps)

@botelastic botelastic Bot added the Team:One Workflow Team label for One Workflow (Workflow automation) label May 12, 2026
Comment thread x-pack/platform/plugins/shared/alerting_v2/server/setup/bind_services.ts Outdated
@cnasikas cnasikas self-requested a review May 14, 2026 11:18
@cnasikas cnasikas marked this pull request as draft May 20, 2026 08:22
@cnasikas cnasikas force-pushed the alerting-v2-workflow-trigger-assign-episode branch from 96980ed to 2714f6f Compare May 21, 2026 12:03
@adcoelho adcoelho marked this pull request as ready for review May 21, 2026 14:41
@cnasikas cnasikas self-assigned this May 21, 2026
@cnasikas
Copy link
Copy Markdown
Member

@macroscope-app review

Copy link
Copy Markdown
Member

@cnasikas cnasikas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and is working as expected!

* Publishes an `episode.assigned` domain event for a persisted assign action.
* No-op when `assignee_uid` is null (unassign).
*/
emitEpisodeAssigned(request: KibanaRequest, action: AlertAction): void;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any schema we can use to type the action for the assigned action?

@adcoelho
Copy link
Copy Markdown
Contributor Author

Just a note that this PR also includes some changes from @cnasikas for the event bus which i reviewed and tested.

@adcoelho adcoelho enabled auto-merge (squash) May 22, 2026 13:24
@adcoelho adcoelho requested a review from a team as a code owner May 22, 2026 18:29
import { z } from '@kbn/zod/v4';
import type { CommonTriggerDefinition } from '@kbn/workflows-extensions/common';

export const EPISODE_ASSIGNED_TRIGGER_ID = 'alertingV2.episodeAssigned' as const;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to specify V2 here? isn't that an implementation detail?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want to distinguish between the existing alerting and the new one (v2) in case we want to introduce triggers for the v1 alerting in the future. Any suggestions?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't have any suggestions atm, maybe @tinnytintin10 could help

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed briefly this one with @tinnytintin10, suggestion is to remove V2 from name, as it also felt a bit as implementation detail. The next step will be for us to offer extra metadata for supporting V1 and V2 if you decide to ship something for alerting V1. Also you can explicitly put in the description that this one is related to V2 if that could help users further, wdyt?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tiamliu based on our recent conversations around how to cater to "mixed experiences" effectively, please check my thinking here?

I think we should call these kinds of events/triggers "alerting.[event_name]" and as much as possible make them version-agnostic. For events that apply to both, we can send them through the same flow, and include "version" as a payload discriminator for people who only want to handle one or the other for some reason (likely suggested by a consultant, support, etc).

Very rough examples:

for alerting.ruleSnoozed
payload might be

{
  "rule.id":  "some-id-for-rule",
  "alerting.version": 2,
  "timestamp": //...
    // ...etc
}

for alerting.alertAssigned and alerting.alertAcknowledged, maybe only v2 alerts have these features, but that's still okay.

Thoughts?

Copy link
Copy Markdown
Member

@cnasikas cnasikas May 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you both for the suggestions, and really appreciate being thorough here, as the trigger ID is not an internal name, but it is part of the public contract users put in their workflow YAML. For events that genuinely exist in both V1 and V2 with the same schemas, a common payload makes a lot of sense, and it will avoid users from having to configure the same workflow twice, one for V1 and one for V2. Where I would like to push back is that V2 has a totally different philosophy, for example, episodes. V1 has alert instances and recoveries with a different lifecycle, so for a lot of V2 events, there is no conceptually equivalent V1 event to merge with. For these V2 events, the version is always going to be 2, never anything else, and presenting a V2-only trigger as if it were version-agnostic is exactly the confusing mixed experience we're trying to prevent. We move the version from the trigger ID to the payload, which is still user-facing.

Some technical concerns when/if the payload diverges between versions on the same events: When we register a trigger, we need to provide a zod schema. We need to construct the schema with a discriminated union based on alerting.version having one schema per version. Workflow authors will have to know which fields to use and how to branch between different versions. We would also need to document which field is on which version. Also, if in the future we need to introduce versioning on the schema of the v2 payload (I hope we will never have to do this 🙂 ), we would need to have a second discriminator (alerting.v2.version) for the new schema. Separating the events based on its top level discriminator (the trigger ID) eliminates these issues, especially if each one's schemas want to evolve differently. Lastly, if we namespace on the trigger ID, it will be easier to deprecate v1 triggers in the future while we are deprecating the alerting v1, telemetry, and audit become a bit easier (not scanning the payload to figure out the version), and discoverability in the workflows UI is unambiguous (a user browsing their trigger catalog knows immediately which alerting world it belongs to).

On the leaking implementation detail in the trigger ID, I hear the concern, but all of our APIs already have the V2 in their path and are exposed to the users. I would suggest keeping the distinction of v1 and v2 in the trigger ID level (maybe rename it to alerting.v2.episodeAssigned or something very different) when the events are different, and keep the common event framework agnostic if the events are the same with the same payload.

Very happy to be wrong about specific cases, though, and if you got an event in mind where the shared-id pattern works even for V2-only features, I would love to talk it through.

@TamerlanG TamerlanG removed the request for review from a team May 25, 2026 11:33
@cnasikas
Copy link
Copy Markdown
Member

@elasticmachine merge upstream


export const episodeAssignedPayloadSchema = z
.object({
occurredAt: z.string().describe(
defaultMessage: 'ISO timestamp of when the assignment occurred.',
})
),
groupHash: z.string().describe(
defaultMessage: 'Stable hash of the alert grouping the episode belongs to.',
})
),
episodeId: z.string().describe(
defaultMessage: 'Identifier of the alerting episode whose assignee changed.',
})
),
ruleId: z.string().describe(
defaultMessage: 'Identifier of the alerting rule the episode belongs to.',
})
),
spaceId: z.string().describe(
Comment on lines +41 to +42
actorUid: z
.string()
Comment on lines +50 to +51
assigneeUid: z
.string()
@kibanamachine
Copy link
Copy Markdown
Contributor

💛 Build succeeded, but was flaky

Failed CI Steps

Test Failures

  • [job] [logs] Scout Lane #5 - stateful-classic / default / local-stateful-classic - Entity analytics management page - Privileges - should show privileges callout and disable switch for user without risk engine privileges
  • [job] [logs] Scout Lane #5 - stateful-classic / default / local-stateful-classic - Entity analytics management page - Risk Score tab - should discard changes when clicking discard button
  • [job] [logs] Scout Lane #5 - stateful-classic / default / local-stateful-classic - Entity analytics management page - Risk Score tab - should show save bar when toggling closed alerts switch
  • [job] [logs] Scout Lane #5 - stateful-classic / default / local-stateful-classic - Entity analytics management page - Risk Score tab - should show save bar when toggling retain checkbox

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
alertingVTwo 1199 1203 +4

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
alertingVTwo 879.8KB 881.4KB +1.5KB

Page load bundle

Size of the bundles that are downloaded on every page load. Target size is below 100kb

id before after diff
alertingVTwo 47.9KB 50.4KB +2.5KB
Unknown metric groups

async chunk count

id before after diff
alertingVTwo 17 18 +1

History

cc @adcoelho @cnasikas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting release_note:skip Skip the PR/issue when compiling release notes Team:One Workflow Team label for One Workflow (Workflow automation) Team:ResponseOps Platform ResponseOps team (formerly the Cases and Alerting teams) t//

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Alerting V2] Add workflow event triggers for when an episode is assigned to a user

7 participants