[FEATURE] emit OTel spans for memory operations (save / load / prefetch / extractor)

### 📋 Prerequisites

- [x] I have searched the [existing issues](./issues) to avoid creating a duplicate
- [x] By submitting this issue, you agree to follow our [Code of Conduct](https://github.com/kagent-dev/kagent/blob/main/CODE_OF_CONDUCT.md)

### 📝 Feature Summary

Add dedicated OTel spans for the memory subsystem (memory.write / memory.read / memory.embed / memory.consolidate / memory.evict), alongside the existing gen_ai.* spans on invoke_agent. Purely additive instrumentation — no behavior change, no new runtime dependencies.

### ❓ Problem Statement / Motivation

kagent's platform-level OTel pipeline is already excellent: A2A metadata propagates as span attributes (v0.9.3), the controller's `invoke_agent` span carries `gen_ai.agent.*` + `gen_ai.provider.name` per the OTel GenAI semconv (verified live against a v0.9.4 deployment, Go OTel SDK 1.43.0), and `helm/kagent/values.yaml` exposes a clean `otel.tracing` block.

What's missing is **dedicated spans for the memory subsystem**: `save_memory`, `load_memory`, `prefetch_memory`, and the auto-extractor that fires every 5th user message.

Without dedicated spans, memory operations are visible only as opaque HTTP POSTs against `/api/sessions/{ctx-id}/events` — which is fine for HTTP-level latency but makes it impossible to compare kagent against other agent-memory backends on retrieval latency, embedding cost, or write amplification.

**Verified baseline (Dynatrace, 2026-05-20, kagent v0.9.4)** — 48 h DQL scan in a live tenant:

| Surface                     | Emitted today |
| --------------------------- | --- |
| Agent invocation            | `invoke_agent` span with `gen_ai.agent.id`, `gen_ai.agent.name`, `gen_ai.operation.name=invoke_agent`, `gen_ai.provider.name=ollama`. Service: `kagent-controller` v0.9.4, OTel Go SDK 1.43.0. |
| HTTP session/task surface   | `POST /api/sessions/{ctx-id}/events`, `POST /api/tasks`, `GET /api/agents` via `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` v0.68.0. HTTP-standard attrs only. |
| A2A worker (Python runtime) | `a2a.server.events.in_memory_queue_manager.*` via Python A2A SDK auto-instrumentation (in-process queue, not user-facing memory). |

**Confirmed gap:** zero spans named `memory.*` across the window; the session-events HTTP span carries no `memory.operation` / `memory.store.kind`; no `memory_*` metric names.

**Who benefits:** kagent operators (retrieval-latency + embedding-cost visibility), the agent-memory benchmarking workstream (apples-to-apples comparison across OSS backends), and the OTel GenAI semconv WG (a reference implementation to point at when `memory.*` is proposed upstream).

### 💡 Proposed Solution

### Spans (new)

| Span name            | Kind     | Required attributes                                       | Optional |
| -------------------- | -------- | --------------------------------------------------------- | --- |
| `memory.write`       | INTERNAL | `memory.operation`, `memory.store.kind`                   | `memory.tenant`, `memory.input.size_bytes`, `memory.extracted.facts_count` |
| `memory.read`        | INTERNAL | `memory.operation`, `memory.store.kind`, `memory.query.k` | `memory.tenant`, `memory.results.count`, `memory.top_similarity` |
| `memory.embed`       | INTERNAL | `memory.embedder.model`                                   | `memory.embed.token_count` |
| `memory.consolidate` | INTERNAL | `memory.consolidate.kind`                                 | `memory.consolidate.input_items` |
| `memory.evict`       | INTERNAL | `memory.evict.reason`                                     | `memory.evict.count` |

`memory.read.kind=prefetch` distinguishes recall-before-LLM-dispatch reads from explicit `load_memory` tool calls.

### Resource attributes (set once per process)

- `memory.sut.name=kagent`
- `memory.sut.architecture=vector`
- `memory.sut.store_backend=pgvector`
- `memory.sut.version` (git SHA or release tag)

### Reuse of existing `gen_ai.*`

For embedding calls inside `memory.embed` and any LLM dispatch inside the auto-extractor we reuse the GenAI semconv kagent already emits on `invoke_agent`:

- `gen_ai.system`, `gen_ai.request.model`
- `gen_ai.operation.name` (extended with `memory.write.extract`, `memory.read.rerank`)
- `gen_ai.usage.input_tokens`, `gen_ai.usage.output_tokens`

No new GenAI conventions — we slot in alongside what's already there.

### Parent-span hygiene

Memory-read spans emitted during a request must be **children of the existing `invoke_agent` span** when the recall happens before LLM dispatch. Keeps the trace tree connected with what users already see in Dynatrace / Honeycomb / Tempo and avoids orphan trees.

### Files 

- `go/internal/memory/store.go` — wrap `save_memory`, `load_memory`, `prefetch_memory`.
- `go/internal/memory/extractor.go` — `memory.consolidate` on the auto-extractor (fires every 5th message).
- **Session-events handler** behind `POST /api/sessions/{ctx-id}/events` — wrap with a child `memory.write` span when the payload is a memory-bearing event. **Highest-leverage single change** — the HTTP span already exists; we just attach business-level semantics.
- `go/pkg/telemetry/spans.go` — add `MemoryOperationName` constants reusing the existing tracer (same provider that wires `invoke_agent` in v0.9.4).
- `docs/observability/memory.md` (NEW).
- `helm/kagent/values.yaml` — document new memory span names under the existing `otel.tracing` block.

Targets the declarative runtime. The BYO ADK runtime documents Memory API as unsupported and is out of scope.


### Question:

1. **Naming sanity check.** Are `memory.read/write/embed/consolidate/evict` reasonable next to the existing `gen_ai.*` envelope on `invoke_agent`?
2. **PR shape.** Single PR with all six files, or phased (handler-only first, then extractor + tools)?
3. **Runtime targeting.** Confirm declarative runtime is the right target (BYO ADK out of scope per docs).
4. **Helm doc placement.** OK to extend the existing `otel.tracing` block in `values.yaml`, or do you want a new `otel.memory` sub-block?

I'll wait for a signal here before creating any PR.

### 🔄 Alternatives Considered

1. **Wait for upstream OTel GenAI semconv to land memory-* natively.** Viable but slow — the GenAI WG cadence has been ~quarterly. We'd rather ship a reference implementation kagent operators can use today and migrate when upstream solidifies. We're tracking `memory-semconv v0.1.0` as the interim contract.
2. **HTTP-attribute overload on `POST /api/sessions/{ctx-id}/events`.** Add `memory.operation` / `memory.store.kind` as attributes on the existing HTTP span instead of creating new spans. Rejected: doesn't model `prefetch_memory` (no HTTP boundary) or the auto-extractor (background tick); also conflates HTTP timing with retrieval timing in dashboards.
3. **Metrics-first (counters + histograms) instead of spans.** Useful but insufficient — metrics can't show parent→child causality (which `invoke_agent` triggered which `memory.read` with which `memory.query.k`). Spans first; metrics derivable from span attributes later via OTel Collector connectors.
4. **Custom kagent-specific attribute namespace (`kagent.memory.*`).** Rejected: locks operators into kagent-specific dashboard queries instead of OTel-portable ones. Following the `gen_ai.*` precedent kagent already adopts.


### 🎯 Affected Service(s)

Controller Service

### 📚 Additional Context

- **Why now / context.** These conventions are being drafted as `memory-semconv v0.1.0` across six OSS agent-memory projects (MemPalace, kagent, sympozium, Graphiti, Mem0, Letta) so a benchmark harness can compare retrieval latency, write amplification, and bi-temporal invalidation churn apples-to-apples. The plan is to ship the convention in **two** implementations first, then propose it upstream to the **OpenTelemetry Semantic Conventions GenAI WG** — same pattern `gen_ai.*` followed.
- **Why kagent is the strongest CNCF-context candidate to land it first:**
  1. The OTel scaffolding is already empirically verified in production.
  2. The `gen_ai.*` envelope already in place proves the maintainers accept OTel semconv as the canonical naming source.
  3. The contribution is purely additive — no behavior change, no new runtime dependencies, OTel Go SDK 1.43.0 already in `go.mod`.
- **Talk credibility note (informational).** This contribution is referenced in an upcoming KubeCon + OSS Summit talk benchmarking OSS agent-memory solutions on Kubernetes (Cognee / MemOS / Honcho / kagent / sympozium / MemPalace). Whatever maintainers decide here is what gets cited — acceptance is not a precondition for the talk.
- **Suggested labels:** `area/observability`, `kind/proposal`. (Add `good first issue` if maintainers feel that fits — totally optional.)


### 🙋 Are you willing to contribute?

- [x] I am willing to submit a PR for this feature

Span name	Kind	Required attributes	Optional
`memory.write`	INTERNAL	`memory.operation`, `memory.store.kind`	`memory.tenant`, `memory.input.size_bytes`, `memory.extracted.facts_count`
`memory.read`	INTERNAL	`memory.operation`, `memory.store.kind`, `memory.query.k`	`memory.tenant`, `memory.results.count`, `memory.top_similarity`
`memory.embed`	INTERNAL	`memory.embedder.model`	`memory.embed.token_count`
`memory.consolidate`	INTERNAL	`memory.consolidate.kind`	`memory.consolidate.input_items`
`memory.evict`	INTERNAL	`memory.evict.reason`	`memory.evict.count`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FEATURE] emit OTel spans for memory operations (save / load / prefetch / extractor) #1909

📋 Prerequisites

📝 Feature Summary

❓ Problem Statement / Motivation

💡 Proposed Solution

Spans (new)

Resource attributes (set once per process)

Reuse of existing `gen_ai.*`

Parent-span hygiene

Files

Question:

🔄 Alternatives Considered

🎯 Affected Service(s)

📚 Additional Context

🙋 Are you willing to contribute?

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Surface	Emitted today
Agent invocation	`invoke_agent` span with `gen_ai.agent.id`, `gen_ai.agent.name`, `gen_ai.operation.name=invoke_agent`, `gen_ai.provider.name=ollama`. Service: `kagent-controller` v0.9.4, OTel Go SDK 1.43.0.
HTTP session/task surface	`POST /api/sessions/{ctx-id}/events`, `POST /api/tasks`, `GET /api/agents` via `go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp` v0.68.0. HTTP-standard attrs only.
A2A worker (Python runtime)	`a2a.server.events.in_memory_queue_manager.*` via Python A2A SDK auto-instrumentation (in-process queue, not user-facing memory).

[FEATURE] emit OTel spans for memory operations (save / load / prefetch / extractor) #1909

Description

📋 Prerequisites

📝 Feature Summary

❓ Problem Statement / Motivation

💡 Proposed Solution

Spans (new)

Resource attributes (set once per process)

Reuse of existing gen_ai.*

Parent-span hygiene

Files

Question:

🔄 Alternatives Considered

🎯 Affected Service(s)

📚 Additional Context

🙋 Are you willing to contribute?

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Reuse of existing `gen_ai.*`