-
Notifications
You must be signed in to change notification settings - Fork 8.6k
[Sig Events] Add significant events management skill + tools #271430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
817a07c
65502fb
5ea41d0
47adb29
ad3e2bf
e389ecf
dc6de9d
e001d97
6832977
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| Search, create, and update significant events for Streams, with guidance to avoid duplicates and keep event lifecycle state accurate. |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,28 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| import { defineSkillType } from '@kbn/agent-builder-server/skills/type_definition'; | ||
| import { | ||
| STREAMS_CREATE_EVENT_TOOL_ID, | ||
| STREAMS_EVENT_STATUS_UPDATE_TOOL_ID, | ||
| STREAMS_SEARCH_EVENTS_TOOL_ID, | ||
| } from '../../tools/register_tools'; | ||
| import description from './description.text'; | ||
| import content from './skill.md.text'; | ||
|
|
||
| export const sigEventsManagementSkill = defineSkillType({ | ||
| id: 'significant-events-management', | ||
| name: 'significant-events-management', | ||
| basePath: 'skills/platform/streams', | ||
| description, | ||
| content, | ||
| getRegistryTools: () => [ | ||
| STREAMS_SEARCH_EVENTS_TOOL_ID, | ||
| STREAMS_CREATE_EVENT_TOOL_ID, | ||
| STREAMS_EVENT_STATUS_UPDATE_TOOL_ID, | ||
| ], | ||
| }); |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,163 @@ | ||
| You manage Significant Events (SigEvents) for Streams. | ||
|
|
||
| <significant_events_concept> | ||
| What a Significant Event is: | ||
| - A SigEvent is a durable incident-level summary of an important operational issue. | ||
| - It captures what happened, why it happened, impact, confidence/criticality, and supporting context. | ||
| - SigEvents should be actionable and non-duplicative. | ||
|
|
||
| Examples of conversation signals that may indicate a SigEvent: | ||
| - Recurring service degradation with clear customer/user impact. | ||
| - Security or compliance risk with confirmed evidence. | ||
| - Cross-stream failures that share a common root cause. | ||
| - Repeated alerting patterns that represent a known incident class. | ||
| </significant_events_concept> | ||
|
|
||
| <available_tools> | ||
| You have 3 SigEvents tools in this skill: | ||
|
|
||
| - `event_search` | ||
| Use this first to find existing similar significant events. | ||
| `query` is optional; if omitted (or empty), search uses optional stream and status filters only. | ||
| `stream_name` is optional; omit it for cross-stream searches. | ||
|
|
||
| - `event_create` | ||
| Use this only after confirming there is no similar existing event. | ||
| The system generates internal identifiers and timestamps automatically. | ||
|
|
||
| - `event_status_update` | ||
| Use this to update an existing event status to one of: `promoted`, `acknowledged`, `demoted`. | ||
| If an event already has the requested status or is missing, update is ignored. | ||
|
|
||
| Additional globally-available helper guidance: | ||
| - `ki_search` can be used to gather existing Knowledge Indicators (KIs) for context. | ||
| - `execute_esql` can be used to confirm the event and collect evidence rows. | ||
| </available_tools> | ||
|
|
||
| <required_workflow> | ||
| Always follow this workflow: | ||
| 1. Detect potential significant-event signals from the conversation. | ||
| 2. Search existing events with `event_search` using optional stream scope and optional query. | ||
| 3. Compare returned events for semantic similarity (same issue class/root cause/impact). | ||
| 4. If a similar event exists, do not create a duplicate; reference the existing event. | ||
| 5. If no similar event exists, investigate and validate: | ||
| - Use `ki_search` to collect related Feature and Query KIs. | ||
| - Use `execute_esql` to confirm the issue and gather supporting facts. | ||
| 6. Create a new event with `event_create` using complete, high-quality event properties. | ||
|
|
||
| Status update workflow: | ||
| 1. Gather evidence supporting the target event state. | ||
| 2. Choose status based on intent: | ||
| - `promoted`: incident-worthy and active. | ||
| - `acknowledged`: known and actively tracked. | ||
| - `demoted`: no longer incident-worthy. | ||
| 3. Call `event_status_update` with the event id and target status. | ||
| 4. Treat `{ updated: 0, ignored: 1 }` as expected when event is missing or already in the requested status. | ||
| </required_workflow> | ||
|
|
||
| <proactive_behavior> | ||
| Be proactive. | ||
|
|
||
| When conversation context and evidence strongly suggest a new significant event, suggest creating one. | ||
|
|
||
| Proactive pattern: | ||
| 1. State why current signals indicate a likely new significant event. | ||
| 2. Run `event_search` to check for similar existing events. | ||
| 3. If no similar event exists, recommend creating a new event and proceed with investigation/evidence collection. | ||
| 4. Use `event_create` with a complete payload. | ||
| </proactive_behavior> | ||
|
|
||
| <ki_and_evidence_guidance> | ||
| Using KIs for context: | ||
| - Prefer KI context before writing root cause and impact. | ||
| - Reuse KI terminology in `summary`, `root_cause`, and `impact` when relevant. | ||
|
|
||
| Using ES|QL for confirmation: | ||
| - Run focused ES|QL queries that directly test the suspected issue. | ||
| - Capture concise evidence in narrative form inside `summary`, `root_cause`, and `impact`. | ||
| - If detailed evidence structures are needed, ensure tool support exists before attempting to submit them. | ||
| </ki_and_evidence_guidance> | ||
|
|
||
| <sig_event_property_guidance> | ||
| Populate event properties with care: | ||
|
|
||
| - `status`: | ||
| - For new events, default to `promoted` unless user intent says otherwise. | ||
|
|
||
| - `title`: | ||
| - Short, specific, human-readable incident title. | ||
|
|
||
| - `summary`: | ||
| - Brief narrative: what happened, where, and key signal. | ||
|
|
||
| - `root_cause`: | ||
| - Best-known causal explanation; avoid vague language. | ||
|
|
||
| - `stream_names`: | ||
| - Include all affected streams. | ||
|
|
||
| - `criticality`: | ||
| - Number in range 0..100 indicating system criticality. | ||
| - Suggested scale: | ||
| - 0-30 low | ||
| - 31-60 medium | ||
| - 61-80 high | ||
| - 81-100 critical | ||
|
Comment on lines
+99
to
+105
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just FYI: |
||
|
|
||
| - `confidence`: | ||
| - Required. Float in range 0..1 indicating confidence in the event assessment. | ||
|
|
||
| - `impact`: | ||
| - Always set to exactly one of: `critical`, `high`, `medium`, `low`. | ||
|
Comment on lines
+110
to
+111
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Making LLMs assign
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Let's see, if we decide to get rid of it, I'll follow up with the skill change. |
||
|
|
||
| - `recommendations`: | ||
| - Required. List of descriptive, detailed mitigation steps. | ||
| </sig_event_property_guidance> | ||
|
|
||
| <tool_examples> | ||
| Example: search existing events first | ||
| Tool: `event_search` | ||
| { | ||
| "query": "checkout latency spike with timeout burst", | ||
| "stream_name": "logs.checkout", | ||
| "status": ["promoted", "acknowledged"] | ||
| } | ||
|
|
||
| Example: filter-only search (no query text) | ||
| Tool: `event_search` | ||
| { | ||
| "stream_name": "logs.checkout", | ||
| "status": ["promoted", "acknowledged"] | ||
| } | ||
|
|
||
| Example: cross-stream search (no stream_name) | ||
| Tool: `event_search` | ||
| { | ||
| "query": "timeout burst", | ||
| "status": ["promoted", "acknowledged"] | ||
| } | ||
|
|
||
| Example: create a new significant event | ||
| Tool: `event_create` | ||
| { | ||
| "status": "promoted", | ||
| "title": "Checkout timeout spike under upstream latency", | ||
| "summary": "Checkout experienced a sustained timeout spike correlated with upstream latency increase.", | ||
| "root_cause": "Upstream dependency latency exceeded service timeout budget during peak load.", | ||
| "stream_names": ["logs.checkout", "logs.payment"], | ||
| "criticality": 86, | ||
| "confidence": 0.84, | ||
| "impact": "high", | ||
| "recommendations": [ | ||
| "Add alert on upstream latency threshold breach", | ||
| "Review timeout configuration for checkout service" | ||
| ] | ||
| } | ||
|
|
||
| Example: update an existing significant event status | ||
| Tool: `event_status_update` | ||
| { | ||
| "event_id": "9c4d04a1-86fe-45c2-b1dd-00891a8ba9f1", | ||
| "status": "acknowledged" | ||
| } | ||
| </tool_examples> | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,39 @@ | ||
| /* | ||
| * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one | ||
| * or more contributor license agreements. Licensed under the Elastic License | ||
| * 2.0; you may not use this file except in compliance with the Elastic License | ||
| * 2.0. | ||
| */ | ||
|
|
||
| import { createEventToolHandler } from './handler'; | ||
|
|
||
| describe('createEventToolHandler', () => { | ||
| it('creates a single event', async () => { | ||
| const eventClient = { | ||
| bulkCreate: jest.fn().mockResolvedValue({}), | ||
| }; | ||
|
|
||
| const result = await createEventToolHandler({ | ||
| eventClient: eventClient as never, | ||
| eventInput: { | ||
| title: 'T', | ||
| summary: 'S', | ||
| root_cause: 'R', | ||
| stream_names: ['logs.a'], | ||
| criticality: 60, | ||
| impact: 'high', | ||
| confidence: 0.7, | ||
| recommendations: ['create incident'], | ||
| }, | ||
| }); | ||
|
|
||
| expect(eventClient.bulkCreate).toHaveBeenCalledTimes(1); | ||
| expect(eventClient.bulkCreate).toHaveBeenCalledWith([ | ||
| expect.objectContaining({ | ||
| discovery_slug: expect.stringMatching(/^agent-event-[a-f0-9]{8}$/), | ||
| }), | ||
| ]); | ||
| expect(result.acknowledged).toBe(true); | ||
| expect(result.event_id).toBeTruthy(); | ||
| }); | ||
| }); |
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
q: was this based on the current investigator or judge agent instructions?
it will be great to use this in the workflows
Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really based on the agents, I just tried to tune it to the conversation flow with agent discovering and proactively suggesting sig events to the user. LLM generated the first draft and I tweaked it after experimenting with the chat.