Skip to content

[sigevents][kis] Unify features and queries in the same data stream#270979

Closed
klacabane wants to merge 37 commits into
elastic:mainfrom
klacabane:sigevents_unified-ki-datastream-v2
Closed

[sigevents][kis] Unify features and queries in the same data stream#270979
klacabane wants to merge 37 commits into
elastic:mainfrom
klacabane:sigevents_unified-ki-datastream-v2

Conversation

@klacabane

@klacabane klacabane commented May 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Ports features and queries from two separate mutable storage backends to a single unified, append-only Knowledge Indicators (KI) data stream, maintaining feature parity. The old FeatureClient and QueryClient are removed and replaced by a single
KnowledgeIndicatorClient.

Model

  • One hidden data stream (.significant_events-knowledge_indicators) holds both feature and query documents, discriminated by type.
  • Identity is (stream.name, type, id). State is reconstructed by selecting the latest revision per group (two-stage INLINE STATSMAX(@timestamp) then MAX(_id) tiebreak).
  • Writes are append-only: updates add a revision; deletes append a tombstone (deleted: true).
  • Reads filter on the latest revision (drop tombstoned / excluded / expired) after the per-group reduction.
  • Feature shape: uuid / status / last_seen / excluded_at (timestamp) are gone; identity is id, with updated_at (revision time) and excluded (boolean).

Testing

  1. Enable Streams: significant events (Advanced Settings) in a space with an Enterprise license; confirm the Significant events tab loads.
  2. Identify features on a stream → features appear; exclude one (row action) → moves to the Excluded tab; restore it → it disappears (re-derived on the next extraction); delete one → gone.
  3. Generate/persist queries → they appear; high-severity non-STATS queries create backing alerting rules; promote/demote/delete behave accordingly.
  4. Multi-stream discovery view: with features sharing an id across streams, confirm they render as distinct rows, applying/removing filters produces no duplicate/ghost rows, and bulk exclude/restore/delete/promote target the correct stream.
  5. Search KIs (keyword + semantic) returns only latest revisions, no duplicates, and respects the active/excluded filter.

@github-actions

Copy link
Copy Markdown
Contributor

@klacabane, this PR increases one or more page-load bundle sizes by 15% or more:

Plugin Before (bytes) After (bytes) Change
agentBuilderPlatform 8,737 15,544 +77.9%
globalSearchBar 26,122 31,212 +19.5%

Large bundle size increases can affect page load performance. Consider whether dependencies can be lazy-loaded or code split to reduce the bundle.

See the bundle optimization guide for tips.

@klacabane klacabane added release_note:skip Skip the PR/issue when compiling release notes backport:skip This PR does not require backporting Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) Team:SigEvents Project team working on Significant Events v9.5.0 labels May 28, 2026
@klacabane klacabane marked this pull request as ready for review May 29, 2026 12:01
@klacabane klacabane requested review from a team as code owners May 29, 2026 12:01
excluded_at: z.string().optional(),
run_id: z.string().optional(),
excluded: z.boolean().optional(),
updated_at: z.string().optional(),
run_id: z.string().optional(),
excluded: z.boolean().optional(),
updated_at: z.string().optional(),
expires_at: z.string().optional(),

const featureBulkOperationSchema = z.union([
z.object({ index: z.object({ feature: featureSchema }) }),
z.object({ delete: z.object({ id: z.string() }) }),
const featureBulkOperationSchema = z.union([
z.object({ index: z.object({ feature: featureSchema }) }),
z.object({ delete: z.object({ id: z.string() }) }),
z.object({ exclude: z.object({ id: z.string() }) }),
z.object({ index: z.object({ feature: featureSchema }) }),
z.object({ delete: z.object({ id: z.string() }) }),
z.object({ exclude: z.object({ id: z.string() }) }),
z.object({ restore: z.object({ id: z.string() }) }),

const featureBulkAcrossStreamsOperationSchema = z.union([
z.object({ delete: z.object({ id: z.string(), stream_name: z.string() }) }),
z.object({ exclude: z.object({ id: z.string(), stream_name: z.string() }) }),
const featureBulkAcrossStreamsOperationSchema = z.union([
z.object({ delete: z.object({ id: z.string(), stream_name: z.string() }) }),
z.object({ exclude: z.object({ id: z.string(), stream_name: z.string() }) }),
z.object({ restore: z.object({ id: z.string(), stream_name: z.string() }) }),
const featureBulkAcrossStreamsOperationSchema = z.union([
z.object({ delete: z.object({ id: z.string(), stream_name: z.string() }) }),
z.object({ exclude: z.object({ id: z.string(), stream_name: z.string() }) }),
z.object({ restore: z.object({ id: z.string(), stream_name: z.string() }) }),
},
params: z.object({
path: z.object({ name: z.string(), uuid: z.string() }),
path: z.object({ name: z.string(), id: z.string() }),
},
params: z.object({
path: z.object({ name: z.string(), uuid: z.string() }),
path: z.object({ name: z.string(), id: z.string() }),
@kibanamachine

kibanamachine commented May 29, 2026

Copy link
Copy Markdown
Contributor

💔 Build Failed

Failed CI Steps

Metrics [docs]

Module Count

Fewer modules leads to a faster build time

id before after diff
streamsApp 1973 1974 +1

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
datasetQuality 541.1KB 541.1KB -60.0B
streamsApp 2.1MB 2.1MB +415.0B
total +355.0B

History

const deletableOps: Array<Extract<KIBulkOperation, { delete: unknown }>> = [];
let deleteSkipped = 0;
for (const op of deleteOps) {
if (deleteLatest.find((doc) => doc.id === op.delete.id)) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (deleteLatest.find((doc) => doc.id === op.delete.id)) {
if (deleteLatest.some((doc) => doc.id === op.delete.id)) {

const latest = docById.get(key);
if (
!latest ||
new Date(latest['@timestamp']).getTime() !== new Date(source['@timestamp']).getTime()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: We don't need a tiebreaker here by _id, right?

query = withSort(query, sort);
// Cap at REVISION_SIZE_LIMIT regardless of the requested limit so a large
// caller-supplied value can't fetch an unbounded result set.
query = query.keep('_source').limit(Math.min(limit, REVISION_SIZE_LIMIT));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

q: should we log a warning in case limit happens to be greater than REVISION_SIZE_LIMIT?

const docs: StoredKnowledgeIndicator[] = [];
for (const op of operations) {
if ('index' in op) {
if ('feature' in op.index) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we're doing this again a few lines above

private readonly ttlDays: number
) {}

async bulk(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function seems to be doing a lot of things. I wonder if its content should be broken into smaller functions

query = withTimeRange(query, options);
if (where) query = withWhere(query, where);
query = pickLatestPerGroup(query, groupBy);
const sortArgs: ComposerSortShorthand[] = sort ?? [['@timestamp', 'DESC']];

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a constant for @timestamp?

query = query.keep('_source');
const query = latestSourceFrom(index, space).where`${esql.col(idField)} == ${esql.str(idValue)}`
.sort(['@timestamp', 'ASC'])
.keep('_source');

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a constant for _source?

const wildcard = (field: string, boost?: number) => ({
wildcard: {
[field]: {
value: `*${escaped}*`,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess leading * will perform a full term scan, and could have performance implications

}

function computeExpiresAt(timestamp: string, ttlDays: number): string {
return new Date(new Date(timestamp).getTime() + ttlDays * 24 * 60 * 60 * 1000).toISOString();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Could we extract this 24 * 60 * 60 * 1000 in to a constant?

Comment on lines +20 to +24
const parts: string[] = [`Stream: ${streamName}`];
if (feature.title) parts.push(`Title: ${feature.title}`);
if (feature.description) parts.push(`Description: ${feature.description}`);
if (feature.type) parts.push(`Type: ${feature.type}`);
if (feature.subtype) parts.push(`Subtype: ${feature.subtype}`);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we could have constants for these texts "Stream" , "Title", etc.

@klacabane

Copy link
Copy Markdown
Contributor Author

Closing as this will be split up in two smaller changes

@klacabane klacabane closed this Jun 1, 2026
klacabane added a commit that referenced this pull request Jun 3, 2026
## Summary

Additive foundation for #270979 — introduces the **unified Knowledge
Indicators (KI) data stream** as a new storage backend for features and
queries, without touching any existing code. Existing `FeatureClient`
and `QueryClient` paths remain fully active.

  **Model**

- One hidden data stream (`.significant_events-knowledge_indicators`)
will hold both `feature` and `query` documents, discriminated by `type`.
- Identity is `(stream.name, type, id)`. State is reconstructed by
selecting the **latest revision per group** (two-stage `INLINE STATS` —
`MAX(@timestamp)` then `MAX(_id)` tiebreak).
- Writes are append-only: updates add a revision; deletes append a
tombstone (`deleted: true`).
- Reads filter on the latest revision (drop tombstoned / excluded /
expired) after the per-group reduction.
  - Supports **keyword + semantic hybrid search** across indicators.

The index template is installed at Kibana startup so the data stream is
ready when callers are migrated over.

  ---

  ## Testing

This PR has **no behavior change**. All existing routes continue to use
`FeatureClient` and `QueryClient`.

- verify the index template was installed at startup:
     GET /_index_template/.significant_events-knowledge_indicators

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
tfcmarques pushed a commit to tfcmarques/kibana that referenced this pull request Jun 11, 2026
## Summary

Additive foundation for elastic#270979 — introduces the **unified Knowledge
Indicators (KI) data stream** as a new storage backend for features and
queries, without touching any existing code. Existing `FeatureClient`
and `QueryClient` paths remain fully active.

  **Model**

- One hidden data stream (`.significant_events-knowledge_indicators`)
will hold both `feature` and `query` documents, discriminated by `type`.
- Identity is `(stream.name, type, id)`. State is reconstructed by
selecting the **latest revision per group** (two-stage `INLINE STATS` —
`MAX(@timestamp)` then `MAX(_id)` tiebreak).
- Writes are append-only: updates add a revision; deletes append a
tombstone (`deleted: true`).
- Reads filter on the latest revision (drop tombstoned / excluded /
expired) after the per-group reduction.
  - Supports **keyword + semantic hybrid search** across indicators.

The index template is installed at Kibana startup so the data stream is
ready when callers are migrated over.

  ---

  ## Testing

This PR has **no behavior change**. All existing routes continue to use
`FeatureClient` and `QueryClient`.

- verify the index template was installed at startup:
     GET /_index_template/.significant_events-knowledge_indicators

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
logeekal pushed a commit to logeekal/kibana that referenced this pull request Jun 25, 2026
## Summary

Additive foundation for elastic#270979 — introduces the **unified Knowledge
Indicators (KI) data stream** as a new storage backend for features and
queries, without touching any existing code. Existing `FeatureClient`
and `QueryClient` paths remain fully active.

  **Model**

- One hidden data stream (`.significant_events-knowledge_indicators`)
will hold both `feature` and `query` documents, discriminated by `type`.
- Identity is `(stream.name, type, id)`. State is reconstructed by
selecting the **latest revision per group** (two-stage `INLINE STATS` —
`MAX(@timestamp)` then `MAX(_id)` tiebreak).
- Writes are append-only: updates add a revision; deletes append a
tombstone (`deleted: true`).
- Reads filter on the latest revision (drop tombstoned / excluded /
expired) after the per-group reduction.
  - Supports **keyword + semantic hybrid search** across indicators.

The index template is installed at Kibana startup so the data stream is
ready when callers are migrated over.

  ---

  ## Testing

This PR has **no behavior change**. All existing routes continue to use
`FeatureClient` and `QueryClient`.

- verify the index template was installed at startup:
     GET /_index_template/.significant_events-knowledge_indicators

---------

Co-authored-by: kibanamachine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backport:skip This PR does not require backporting Feature:SigEvents Significant events feature, related to streams and rules/alerts (RnA) release_note:skip Skip the PR/issue when compiling release notes Team:SigEvents Project team working on Significant Events v9.5.0

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants