Skip to content

[FEATURE] emit OTel spans for memory operations (save / load / prefetch / extractor) #1909

@henrikrexed

Description

@henrikrexed

📋 Prerequisites

📝 Feature Summary

Add dedicated OTel spans for the memory subsystem (memory.write / memory.read / memory.embed / memory.consolidate / memory.evict), alongside the existing gen_ai.* spans on invoke_agent. Purely additive instrumentation — no behavior change, no new runtime dependencies.

❓ Problem Statement / Motivation

kagent's platform-level OTel pipeline is already excellent: A2A metadata propagates as span attributes (v0.9.3), the controller's invoke_agent span carries gen_ai.agent.* + gen_ai.provider.name per the OTel GenAI semconv (verified live against a v0.9.4 deployment, Go OTel SDK 1.43.0), and helm/kagent/values.yaml exposes a clean otel.tracing block.

What's missing is dedicated spans for the memory subsystem: save_memory, load_memory, prefetch_memory, and the auto-extractor that fires every 5th user message.

Without dedicated spans, memory operations are visible only as opaque HTTP POSTs against /api/sessions/{ctx-id}/events — which is fine for HTTP-level latency but makes it impossible to compare kagent against other agent-memory backends on retrieval latency, embedding cost, or write amplification.

Verified baseline (Dynatrace, 2026-05-20, kagent v0.9.4) — 48 h DQL scan in a live tenant:

Surface Emitted today
Agent invocation invoke_agent span with gen_ai.agent.id, gen_ai.agent.name, gen_ai.operation.name=invoke_agent, gen_ai.provider.name=ollama. Service: kagent-controller v0.9.4, OTel Go SDK 1.43.0.
HTTP session/task surface POST /api/sessions/{ctx-id}/events, POST /api/tasks, GET /api/agents via go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.68.0. HTTP-standard attrs only.
A2A worker (Python runtime) a2a.server.events.in_memory_queue_manager.* via Python A2A SDK auto-instrumentation (in-process queue, not user-facing memory).

Confirmed gap: zero spans named memory.* across the window; the session-events HTTP span carries no memory.operation / memory.store.kind; no memory_* metric names.

Who benefits: kagent operators (retrieval-latency + embedding-cost visibility), the agent-memory benchmarking workstream (apples-to-apples comparison across OSS backends), and the OTel GenAI semconv WG (a reference implementation to point at when memory.* is proposed upstream).

💡 Proposed Solution

Spans (new)

Span name Kind Required attributes Optional
memory.write INTERNAL memory.operation, memory.store.kind memory.tenant, memory.input.size_bytes, memory.extracted.facts_count
memory.read INTERNAL memory.operation, memory.store.kind, memory.query.k memory.tenant, memory.results.count, memory.top_similarity
memory.embed INTERNAL memory.embedder.model memory.embed.token_count
memory.consolidate INTERNAL memory.consolidate.kind memory.consolidate.input_items
memory.evict INTERNAL memory.evict.reason memory.evict.count

memory.read.kind=prefetch distinguishes recall-before-LLM-dispatch reads from explicit load_memory tool calls.

Resource attributes (set once per process)

  • memory.sut.name=kagent
  • memory.sut.architecture=vector
  • memory.sut.store_backend=pgvector
  • memory.sut.version (git SHA or release tag)

Reuse of existing gen_ai.*

For embedding calls inside memory.embed and any LLM dispatch inside the auto-extractor we reuse the GenAI semconv kagent already emits on invoke_agent:

  • gen_ai.system, gen_ai.request.model
  • gen_ai.operation.name (extended with memory.write.extract, memory.read.rerank)
  • gen_ai.usage.input_tokens, gen_ai.usage.output_tokens

No new GenAI conventions — we slot in alongside what's already there.

Parent-span hygiene

Memory-read spans emitted during a request must be children of the existing invoke_agent span when the recall happens before LLM dispatch. Keeps the trace tree connected with what users already see in Dynatrace / Honeycomb / Tempo and avoids orphan trees.

Files

  • go/internal/memory/store.go — wrap save_memory, load_memory, prefetch_memory.
  • go/internal/memory/extractor.gomemory.consolidate on the auto-extractor (fires every 5th message).
  • Session-events handler behind POST /api/sessions/{ctx-id}/events — wrap with a child memory.write span when the payload is a memory-bearing event. Highest-leverage single change — the HTTP span already exists; we just attach business-level semantics.
  • go/pkg/telemetry/spans.go — add MemoryOperationName constants reusing the existing tracer (same provider that wires invoke_agent in v0.9.4).
  • docs/observability/memory.md (NEW).
  • helm/kagent/values.yaml — document new memory span names under the existing otel.tracing block.

Targets the declarative runtime. The BYO ADK runtime documents Memory API as unsupported and is out of scope.

Question:

  1. Naming sanity check. Are memory.read/write/embed/consolidate/evict reasonable next to the existing gen_ai.* envelope on invoke_agent?
  2. PR shape. Single PR with all six files, or phased (handler-only first, then extractor + tools)?
  3. Runtime targeting. Confirm declarative runtime is the right target (BYO ADK out of scope per docs).
  4. Helm doc placement. OK to extend the existing otel.tracing block in values.yaml, or do you want a new otel.memory sub-block?

I'll wait for a signal here before creating any PR.

🔄 Alternatives Considered

  1. Wait for upstream OTel GenAI semconv to land memory- natively.* Viable but slow — the GenAI WG cadence has been ~quarterly. We'd rather ship a reference implementation kagent operators can use today and migrate when upstream solidifies. We're tracking memory-semconv v0.1.0 as the interim contract.
  2. HTTP-attribute overload on POST /api/sessions/{ctx-id}/events. Add memory.operation / memory.store.kind as attributes on the existing HTTP span instead of creating new spans. Rejected: doesn't model prefetch_memory (no HTTP boundary) or the auto-extractor (background tick); also conflates HTTP timing with retrieval timing in dashboards.
  3. Metrics-first (counters + histograms) instead of spans. Useful but insufficient — metrics can't show parent→child causality (which invoke_agent triggered which memory.read with which memory.query.k). Spans first; metrics derivable from span attributes later via OTel Collector connectors.
  4. Custom kagent-specific attribute namespace (kagent.memory.*). Rejected: locks operators into kagent-specific dashboard queries instead of OTel-portable ones. Following the gen_ai.* precedent kagent already adopts.

🎯 Affected Service(s)

Controller Service

📚 Additional Context

  • Why now / context. These conventions are being drafted as memory-semconv v0.1.0 across six OSS agent-memory projects (MemPalace, kagent, sympozium, Graphiti, Mem0, Letta) so a benchmark harness can compare retrieval latency, write amplification, and bi-temporal invalidation churn apples-to-apples. The plan is to ship the convention in two implementations first, then propose it upstream to the OpenTelemetry Semantic Conventions GenAI WG — same pattern gen_ai.* followed.
  • Why kagent is the strongest CNCF-context candidate to land it first:
    1. The OTel scaffolding is already empirically verified in production.
    2. The gen_ai.* envelope already in place proves the maintainers accept OTel semconv as the canonical naming source.
    3. The contribution is purely additive — no behavior change, no new runtime dependencies, OTel Go SDK 1.43.0 already in go.mod.
  • Talk credibility note (informational). This contribution is referenced in an upcoming KubeCon + OSS Summit talk benchmarking OSS agent-memory solutions on Kubernetes (Cognee / MemOS / Honcho / kagent / sympozium / MemPalace). Whatever maintainers decide here is what gets cited — acceptance is not a precondition for the talk.
  • Suggested labels: area/observability, kind/proposal. (Add good first issue if maintainers feel that fits — totally optional.)

🙋 Are you willing to contribute?

  • I am willing to submit a PR for this feature

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels
    No fields configured for Feature.

    Projects

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions