Skip to content

[sigevents] Nightshift northstar — memory workflows, onboarding, and gap detection#271444

Draft
flash1293 wants to merge 71 commits into
elastic:mainfrom
flash1293:jreuter/nightshift-northstar
Draft

[sigevents] Nightshift northstar — memory workflows, onboarding, and gap detection#271444
flash1293 wants to merge 71 commits into
elastic:mainfrom
flash1293:jreuter/nightshift-northstar

Conversation

@flash1293
Copy link
Copy Markdown
Contributor

Summary

Combines three stacked sigevents memory PRs into a single draft for end-to-end review and testing:

Merge conflict resolutions

  • Kept the migration branch’s workflow-based memory tasks (synthesis / consolidate / scrape) instead of re-registering those as Agent Builder agents
  • Registered only the chat-driven agents: sigevents.memory.system-onboarding and sigevents.memory.gap-detector
  • Merged both onboarding and gap-detection skills into allow_lists.ts

Test plan

  • Enable observability:streamsEnableMemory and verify memory data stream / managed workflows install on Kibana start
  • Memory tab: synthesize, consolidate, scrape conversations, detect gaps (workflow triggers)
  • Significant Events discovery: "Tell us about your system" opens onboarding agent chat
  • Gap detection workflow runs manually and via weekly schedule; _gaps/overview page is written
  • node scripts/check_changes.ts on touched packages

Supersedes

Made with Cursor

flash1293 and others added 30 commits April 15, 2026 15:54
…ecture

Replaces the Elasticsearch StorageIndexAdapter-based memory system with
an append-only `.significant_events-memories` data stream, ES|QL read tools,
and Agent Builder skills/agents that delegate writes to a streams-program workflow.

## What changed

### Data layer
- New `memoriesDataStream` definition (`data_stream.ts`) with append-only semantics:
  latest doc per `page_name` by `@timestamp` is the current state; `is_deleted: true`
  is a tombstone
- New `MemoryClient` with `bulkCreate` / `findLatest` / `findLatestByName` backed by
  ES|QL queries
- Minimal `MemoryService` wrapping `DataStreamClient` + `MemoryClient`
- `memoriesDataStream` added to `SIGNIFICANT_EVENTS_DATA_STREAMS` so the template is
  installed on plugin start

### Agent Builder
- Three ES|QL read tools (`get_page`, `search_pages`, `list_pages`, `get_insights`) and
  one workflow write tool (`write_page` → `.streams-write-memory-page`) registered via
  `agentBuilder.tools.register`
- Three skills with system prompts and `getRegistryTools`:
  `streams-memory-synthesis`, `streams-memory-consolidation`,
  `streams-conversation-scraper`
- Three thin agents each wired to their skill via `skill_ids`:
  `sigevents.memory.synthesizer`, `sigevents.memory.consolidator`,
  `sigevents.memory.conversation-scraper`

### HTTP routes
- All `/internal/streams/memory/*` routes rewritten to use `MemoryClient` directly
- UI contract (`MemoryEntry`, `MemoryCategoryNode`, `MemorySearchResult`) preserved
  through mapping functions — no UI changes required

### Dead code removed
- Old TS memory tools (`memory_read`, `write`, `search`, `list`, `patch`, `delete`,
  `recent_changes`), `MemoryServiceImpl`, `StorageIndexAdapter` (memory only),
  `history_storage`, `triggers/`, `sig_events_memory_skill`
- Nine task files: `memory_generation`, `memory_update`, `memory_consolidation`,
  `conversation_scraper` (+ prompts)
- `memory_generation.ts` and `memory_discovery_tools.ts` from `sig_events/`
- Inline `taskClient.schedule(MEMORY_GENERATION_TASK_TYPE)` calls removed from
  `insights_discovery.ts` and `onboarding.ts`
- All `createMemoryDiscoveryTools` call sites removed

### Constants
- `MEMORIES_DATA_STREAM`, `WRITE_MEMORY_PAGE_WORKFLOW_ID`,
  `SIGEVENTS_MEMORY_WORKFLOW_ID` added to `common/constants.ts`

`OBSERVABILITY_STREAMS_ENABLE_MEMORY` UI setting and all `MemoryTab` UI code are
unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
…ored TS tools

Refactors the sigevents memory system from a task-based architecture to
Kibana workflow-driven execution while keeping core memory tools in TypeScript.

Key changes:
- Replace old Kibana task scheduler with Agent Builder workflows (Memory Synthesis,
  Memory Consolidation, Conversation Scraper, Write Memory Page)
- Restore full TypeScript memory tools (read/write/patch/list/search/delete/recent_changes)
  and MemoryServiceImpl; remove ES|QL static tools
- Migrate memory history storage to a new append-only data stream
  (.significant_events-memory-history)
- Add HTTP routes to trigger workflows from UI:
  POST _scrape_conversations, POST _consolidate, POST _synthesize
- Convert memory skills (synthesis, consolidation, conversation-scraper) to
  factory functions using getInlineTools, wiring TS tools into Agent Builder skills
- Register platform.streams.memory.write_page as a workflow tool globally
- Add "Synthesize Memory" button to the memory UI alongside existing buttons
- Fix permission issue by using asCurrentUser instead of asInternalUser in
  memory route handlers

Co-authored-by: Cursor <cursoragent@cursor.com>
The memory synthesizer agent now has access to
platform_streams_sig_events_search_knowledge_indicators as an inline
tool, enabling it to fetch features and queries from the streams KI
indices before writing memory pages.

Also extend MemoryToolsOptions with optional getScopedClients, server,
and logger so the synthesis skill can construct the KI tool at runtime.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace StorageIndexAdapter (.chat-memory) with direct writes to the
.significant_events-memories data stream. Pages are append-only;
latest version is resolved via collapse on `name` sorted by @timestamp.
History continues to use .significant_events-memory-history.

- Update memories data stream mappings to include all MemoryEntry fields
  (id, name, version, created_at, updated_at, created_by, updated_by)
- Rewrite MemoryServiceImpl to use esClient directly against the data
  streams; remove StorageIndexAdapter dependency
- Delete storage.ts

Co-authored-by: Cursor <cursoragent@cursor.com>
Memory tool handlers were using asInternalUser (kibana_system) to
create MemoryServiceImpl, but kibana_system has no privileges on
.significant_events-* data streams. All tool calls from agent runs
were silently failing with security_exception.

Fix: getMemoryService now accepts an ElasticsearchClient argument.
All 7 memory tool handlers pass context.esClient.asCurrentUser, matching
how the KI search tool and other SigEvents clients are wired.

Also fixes collapse sort to use version desc as primary key (timestamp
tiebreaker was non-deterministic within the same second), and sets
is_deleted: false explicitly on every new document write so the
is_deleted: false filter in listAll/listByCategory works correctly.

Co-authored-by: Cursor <cursoragent@cursor.com>
Registers the three memory workflows (synthesis, consolidation, conversation
scraper) as Kibana-managed system workflows in kbn-workflows, mirroring the
pattern from PR elastic#270377 for KI workflows.

- Adds streams_memory managed workflow definitions in kbn-workflows/managed/
- Registers streams as a managed workflow owner (setup) and installs all three
  workflows globally on startup (start) via workflowsExtensions
- Memory trigger routes now look up workflows by managed ID instead of
  searching by name, removing the dependency on manual streams-program deployment
- Removes the write_memory_page Agent Builder tool (agents write directly
  via the memory_write TypeScript tool)
- Cleans up workflow name constants from constants.ts

Co-authored-by: Cursor <cursoragent@cursor.com>
…filter, boom errors, and UX

- Fix memory synthesis skill exceeding 7 inline tool limit by scoping to
  only the 4 tools it uses (search, read, write, list) + KI tool
- Export individual tool constructors from tools/memory index
- Fix consolidation and scraper skill prompts using dot-notation tool IDs
  instead of underscore-notation that getInlineTools registers
- Add is_deleted filter to getBacklinks (was returning deleted pages)
- Align listByCategory sort to version desc, @timestamp desc (was using
  @timestamp only, could return stale versions)
- Add is_deleted?: boolean to MemoryEntry type, removing unsafe casts
- Replace generic Error throws with boom typed errors (forbidden,
  serverUnavailable) in memory routes
- Delete dead MemoryClient (queried non-existent page_name/kibana.space_ids fields)
- Fix memory_patch error message to say "no changes persisted" and include
  success count before the failure
- Add patchOperationSchema .refine() to reject all-undefined operations
- Gate installMemoryWorkflows on OBSERVABILITY_STREAMS_ENABLE_MEMORY flag
- Disable workflow trigger buttons (Scrape/Consolidate/Synthesize) for
  users without manage privilege
- Update workflow success toast to say "queued" not "started successfully"
- Fix stale comments referencing removed write_memory_page tool

Co-authored-by: Cursor <cursoragent@cursor.com>
All read methods in MemoryServiceImpl now catch index_not_found_exception
and return empty results instead of propagating a 404 from Elasticsearch.
This covers _searchLatest, search, listAll, listByCategory, getHistory,
getRecentChanges, and getVersion, so the memory UI shows empty state
rather than a stuck spinner before the data streams are seeded.

Co-authored-by: Cursor <cursoragent@cursor.com>
…n Workflows UI

The workflow trigger routes were passing GLOBAL_WORKFLOW_SPACE_ID ('*') as
the spaceId for both getWorkflow and runWorkflow, so executions were recorded
under the global space and invisible in the per-space Workflows UI.

Now use the user's current space ID (via server.spaces.getSpaceId(request),
falling back to DEFAULT_SPACE_ID) to match the pattern used by the standard
run_workflow route. This means managed global workflows are looked up with
includeGlobal: true but executed in the user's space, making the execution
visible on the Workflows page.

Also adds spaces to StreamsServer so route handlers can access it.

Co-authored-by: Cursor <cursoragent@cursor.com>
The workflow was querying the AI Assistant index
(.kibana-observability-ai-assistant-conversations) but Agent Builder
stores conversations in .chat-conversations with a different schema.

Updated the get_recent_conversations step to:
- Use .chat-conversations index
- Filter by updated_at instead of @timestamp
- Fetch the correct _source fields (id, agent_id, title, rounds, etc.)
- Exclude the memory agents' own conversations to avoid circular scraping

Bumped workflow version to 2 so the managed workflow system picks up
the change on restart.

Co-authored-by: Cursor <cursoragent@cursor.com>
…emory-workflow-migration

Co-authored-by: Cursor <cursoragent@cursor.com>

# Conflicts:
#	src/platform/packages/shared/kbn-workflows/managed/managed_workflow_definitions.test.ts
Tombstones now bump version and sort collapse by updated_at so the latest
document wins. list/search/browse no longer pre-filter is_deleted before
collapse, which previously kept deleted pages visible in the category tree.

Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
When create is called for a name whose latest document is a delete tombstone,
append a new live document at tombstone.version + 1 reusing the same entry id
so collapse resolves to the restored page.

Co-authored-by: Cursor <cursoragent@cursor.com>
Covers create, update, list, delete tombstone visibility, restore-by-name
undelete, duplicate name rejection, and search exclusion of deleted pages
using an in-memory Elasticsearch client mock.

Co-authored-by: Cursor <cursoragent@cursor.com>
Covers create, read, search, categories, soft-delete, and restore-by-name
against internal memory routes with test helpers to toggle the feature flag.

Co-authored-by: Cursor <cursoragent@cursor.com>
Collapse listings on id so get-by-id and list stay correct after rename; tombstone the old name on rename. Trigger the memory synthesis workflow after insights discovery when memory is enabled. Fix managed workflow template types for static yaml workflows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflict in update_execution_context_on_route_change.tsx by keeping
the branch getPageName helper for discovery/management tab execution context.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve kbn-workflows managed workflow type/test conflicts using upstream
typing while keeping streams memory workflow definitions.

Co-authored-by: Cursor <cursoragent@cursor.com>
Adds a gap detection agent that audits the sigevents memory knowledge base
against 11 required knowledge dimensions (services, deployment, infrastructure,
observability coverage, change gates, health-checking, failure modes, MCP
integrations, code repos, data flows, and access points/connectors) and writes
a structured _gaps/overview memory page with findings, priority gaps, and
suggested next steps.

New additions:
- `streams-gap-detection` Agent Builder skill with 11-dimension audit prompt,
  memory tools + KI search as inline tools, and inspect_streams + get_connectors
  as registry tools
- `sigevents.memory.gap-detector` Agent Builder agent registered in allow_lists
- `POST /internal/streams/memory/_detect_gaps` route using the existing workflow
  trigger helper
- `Gap Detection` workflow YAML (manual + weekly schedule, drop concurrency)
- `useDetectGaps` hook and "Detect Gaps" button on the Memory tab UI

Co-authored-by: Cursor <cursoragent@cursor.com>
memory_synthesis_skill: add observability.get_services and
observability.get_trace_metrics as registry tools so the synthesizer
can discover service dependencies, call chains, and latency distributions
from APM data. Traces are treated as supplemental — synthesis continues
normally when APM data is absent.

use_memory: extract server-side error message from the IHttpFetchError
body so workflow-not-found 404s (e.g. "Detect Gaps" before the workflow
is deployed) display a human-readable message instead of "Not Found".

Co-authored-by: Cursor <cursoragent@cursor.com>
Await async registerAgentBuilderTools in unit tests to satisfy
@typescript-eslint/no-floating-promises.

Co-authored-by: Cursor <cursoragent@cursor.com>
Stack on workflow migration. Registers sigevents.memory.system-onboarding
agent with significant-events-onboarding skill (memory tools) and adds the
"Tell us about your system" AiButton on the discovery page.

Co-authored-by: Cursor <cursoragent@cursor.com>
…hot.

Sigevents memory work now runs via managed workflows instead of Task Manager task definitions.

Co-authored-by: Cursor <cursoragent@cursor.com>
flash1293 and others added 30 commits May 28, 2026 17:27
Replace Task Manager memory consolidation, conversation scraping, and
synthesis with managed workflows and Agent Builder skills. Keeps the
existing memory storage layer unchanged for independent review.

Co-authored-by: Cursor <cursoragent@cursor.com>
Replace the legacy .chat-memory index with dedicated memory and history
data streams, request-scoped Elasticsearch access for memory tools, and
Scout API coverage. Task Manager memory jobs are unchanged.

Co-authored-by: Cursor <cursoragent@cursor.com>
Register memory skills after server.core and licensing are available so
the synthesis skill can attach its KI search tool. Restore scoped client
options, report per-skill registration failures, and remove a duplicate
memory route export that broke Kibana startup.

Co-authored-by: Cursor <cursoragent@cursor.com>
MemoryServiceImpl and the memories data stream definition import this name;
without it, index calls fail at runtime with an undefined index.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflicts in agent builder registration (keep split memory
skills registration, adopt streamsKIsOnboardingClient from main).

Co-authored-by: Cursor <cursoragent@cursor.com>
Memory data streams must be read and written as the user who scheduled
the task, not as kibana_system. Aligns with insights discovery and other
sig-events paths that use scopedClusterClient.asCurrentUser.

Co-authored-by: Cursor <cursoragent@cursor.com>
… flag

Integrate streams.significantEventsMemoryEnabled feature flag with the
workflow migration branch. Keep workflow-based memory triggers, subscribe
to feature flag changes for runtime workflow install and skill registration.

Co-authored-by: Cursor <cursoragent@cursor.com>
Pass feature-flag assertMemoryEnabled in workflow trigger routes, restore
memoryTools in generateSignificantEventDefinitions, and fix rxjs import order.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve register.ts conflict: keep async tool registration and
memory skill wiring; adopt upstream streamsKIsOnboardingClient for skills.

Co-authored-by: Cursor <cursoragent@cursor.com>
Resolve conflicts by keeping workflow-based memory triggers and
registerStreamsMemoryAgentBuilder in start(), while adopting main's
feature flag changes elsewhere.

Co-authored-by: Cursor <cursoragent@cursor.com>
Integrates elastic#271671 (significant events memory feature flag) while keeping
scoped Elasticsearch clients for memory routes and Task Manager tasks.

Co-authored-by: Cursor <cursoragent@cursor.com>
Integrate data-stream memory storage, feature-flag gating, scoped ES
clients, and memory discovery tools while keeping northstar workflow
routes and managed memory workflows.

Co-authored-by: Cursor <cursoragent@cursor.com>
Bring workflow trigger routes, synthesis skills, gap detection, and
feature-flag-driven managed workflow installation into the combined
northstar + datastream memory stack.

Co-authored-by: Cursor <cursoragent@cursor.com>
Remove duplicate memory workflow exports, unused uiSettings destructuring,
and duplicate workflowsExtensions plugin dependency.

Co-authored-by: Cursor <cursoragent@cursor.com>
Drop the unused server-side memory trigger registry and orphaned memory_update
task, collapse duplicate skill registration callbacks to onMemoryEnabled, and
install KI and memory managed workflows through a single initManagedWorkflowsClient.

Co-authored-by: Cursor <cursoragent@cursor.com>
Clarify that the most important thing to store in sigevents memory is
where information comes from and how to verify it again (connectors,
repos, dashboards, etc.).

Co-authored-by: Cursor <cursoragent@cursor.com>
Keep northstar chat-driven agents and memory triggers; adopt workflow
installation and drop workflow skills from register_skills.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants