[DOC] Add documentation for agent / gen-ai trace enrichment in existing `otel_traces` processor

## Description

Add documentation for the GenAI agent trace enrichment behavior built into the `otel_traces` processor in RFC [#6542](https://github.com/opensearch-project/data-prepper/issues/6542) / PR [#6548](https://github.com/opensearch-project/data-prepper/pull/6548).

The `otel_traces` processor now automatically normalizes vendor-specific GenAI span attributes to OTel semantic conventions, propagates key `gen_ai.*` attributes from child spans to root spans, aggregates token usage counts, and strips flattened sub-keys that conflict with parent string values (e.g. `llm.input_messages.0.message.content` when `llm.input_messages` is already a string). This enrichment enables the agent trace visualizations in OpenSearch Dashboards ([OpenSearch-Dashboards#11387](https://github.com/opensearch-project/OpenSearch-Dashboards/pull/11387)).

### What to document

**Always-on enrichment (requires `output_format: otel`)**
* GenAI enrichment runs automatically on every batch. It is a no-op for traces without `gen_ai.*` attributes, so there is no impact on existing non-GenAI pipelines.
* The source must output spans with original OTel attribute key names (e.g. `gen_ai.operation.name`, not `span.attributes.gen_ai@operation@name`). This requires:
  * `otel_trace_source` with `output_format: otel` (the default is `opensearch`, which transforms keys and breaks enrichment), OR
  * `otlp` source, which defaults to `output_format: otel` and works out of the box.
* The OpenSearch sink must use `index_type: trace-analytics-plain-raw` to match the otel output format. The default `trace-analytics-raw` expects the `opensearch` key format.
* Example pipeline configuration using `otlp` source (recommended):
  ```yaml
  otlp-pipeline:
    source:
      otlp:
        ssl: false
    route:
      - traces: 'getEventType() == "TRACE"'
    sink:
      - pipeline:
          name: "traces-raw-pipeline"
          routes:
            - "traces"

  traces-raw-pipeline:
    source:
      pipeline:
        name: "otlp-pipeline"
    processor:
      - otel_traces:
          # Time in seconds to buffer child spans waiting for root span (default: 180)
          trace_flush_interval: 180
    sink:
      - opensearch:
          index_type: trace-analytics-plain-raw
          # ...
  ```

**Vendor attribute normalization**

Automatically maps vendor-specific attributes to [OTel GenAI Semantic Conventions (v1.39.0)](https://opentelemetry.io/docs/specs/semconv/gen-ai/). Original attributes are preserved; normalized attributes are added alongside. Normalization is skipped if the target attribute already exists on the span.

For the current ground truth, see [`genai-attribute-mappings.yaml`](https://github.com/opensearch-project/data-prepper/blob/main/data-prepper-plugins/otel-trace-raw-processor/src/main/resources/genai-attribute-mappings.yaml).

**OpenInference mappings**

| Source attribute | Target attribute | Notes |
|---|---|---|
| `llm.token_count.prompt` | `gen_ai.usage.input_tokens` | |
| `llm.token_count.completion` | `gen_ai.usage.output_tokens` | |
| `llm.model_name` | `gen_ai.request.model` | |
| `llm.provider` | `gen_ai.provider.name` | |
| `llm.input_messages` | `gen_ai.input.messages` | |
| `llm.output_messages` | `gen_ai.output.messages` | |
| `embedding.model_name` | `gen_ai.request.model` | |
| `tool.name` | `gen_ai.tool.name` | |
| `tool.description` | `gen_ai.tool.description` | |
| `tool_call.function.arguments` | `gen_ai.tool.call.arguments` | |
| `tool_call.id` | `gen_ai.tool.call.id` | |
| `reranker.model_name` | `gen_ai.request.model` | |
| `agent.name` | `gen_ai.agent.name` | |
| `session.id` | `gen_ai.conversation.id` | |
| `openinference.span.kind` | `gen_ai.operation.name` | Value mapped (see below) |

**OpenLLMetry mappings**

| Source attribute | Target attribute | Notes |
|---|---|---|
| `llm.usage.prompt_tokens` | `gen_ai.usage.input_tokens` | |
| `llm.usage.completion_tokens` | `gen_ai.usage.output_tokens` | |
| `llm.request.model` | `gen_ai.request.model` | |
| `llm.response.model` | `gen_ai.response.model` | |
| `llm.request.max_tokens` | `gen_ai.request.max_tokens` | |
| `llm.request.temperature` | `gen_ai.request.temperature` | |
| `llm.request.top_p` | `gen_ai.request.top_p` | |
| `llm.top_k` | `gen_ai.request.top_k` | |
| `llm.frequency_penalty` | `gen_ai.request.frequency_penalty` | |
| `llm.presence_penalty` | `gen_ai.request.presence_penalty` | |
| `llm.chat.stop_sequences` | `gen_ai.request.stop_sequences` | |
| `llm.request.functions` | `gen_ai.tool.definitions` | |
| `llm.response.finish_reason` | `gen_ai.response.finish_reasons` | Wrapped as JSON array |
| `llm.response.stop_reason` | `gen_ai.response.finish_reasons` | Wrapped as JSON array |
| `llm.request.type` | `gen_ai.operation.name` | Value mapped (see below) |
| `traceloop.span.kind` | `gen_ai.operation.name` | Value mapped (see below) |
| `traceloop.entity.name` | `gen_ai.agent.name` | |
| `traceloop.entity.input` | `gen_ai.input.messages` | |
| `traceloop.entity.output` | `gen_ai.output.messages` | |

**`gen_ai.operation.name` value mappings** (case-insensitive)

| Source value | Mapped value |
|---|---|
| `LLM` | `chat` |
| `EMBEDDING` / `embedding` | `embeddings` |
| `CHAIN` / `workflow` / `task` / `AGENT` / `agent` | `invoke_agent` |
| `RETRIEVER` / `RERANKER` / `rerank` | `retrieval` |
| `TOOL` / `tool` | `execute_tool` |
| `PROMPT` / `completion` | `text_completion` |
| `chat` | `chat` |

**Root span enrichment**
* Propagates `gen_ai.system`, `gen_ai.provider.name`, `gen_ai.agent.name`, `gen_ai.request.model`, and `gen_ai.operation.name` from child spans to root span (first-child-wins).
* Aggregates `gen_ai.usage.input_tokens` and `gen_ai.usage.output_tokens` across all child spans to root (sum). Skipped if root already has token counts.
* Skip-if-present semantics apply to all attributes — existing root values are never overwritten.

**Flattened sub-key stripping**
* Strips flattened sub-keys that conflict with parent string values, preventing OpenSearch mapping failures. This applies only to the following four parent keys:
  * `llm.input_messages`, `llm.output_messages`, `gen_ai.prompt`, `gen_ai.completion`
* For example, `llm.input_messages.0.message.content` is stripped when `llm.input_messages` exists as a string value. If only the sub-keys exist (no parent string), they are preserved.

**Known limitations**
* In-memory batch only — the processor buffers child spans until the root span arrives (up to `trace_flush_interval`, default 180s), then flushes them together. Enrichment runs on this combined batch. However, if the root span arrives *before* its children, it is flushed immediately without enrichment. Children arriving later in separate batches cannot retroactively enrich an already-flushed root.
* First-child-wins for string attributes — traces with multiple LLM providers will get the first child's value.

### Related
* RFC: [#6542](https://github.com/opensearch-project/data-prepper/issues/6542)
* Implementation: [#6548](https://github.com/opensearch-project/data-prepper/pull/6548) (commit `6eb06fe60`)
* README updates: [#6570](https://github.com/opensearch-project/data-prepper/pull/6570)
* Dashboards visualization: [OpenSearch-Dashboards#11387](https://github.com/opensearch-project/OpenSearch-Dashboards/pull/11387)


Source attribute	Target attribute	Notes
`llm.token_count.prompt`	`gen_ai.usage.input_tokens`
`llm.token_count.completion`	`gen_ai.usage.output_tokens`
`llm.model_name`	`gen_ai.request.model`
`llm.provider`	`gen_ai.provider.name`
`llm.input_messages`	`gen_ai.input.messages`
`llm.output_messages`	`gen_ai.output.messages`
`embedding.model_name`	`gen_ai.request.model`
`tool.name`	`gen_ai.tool.name`
`tool.description`	`gen_ai.tool.description`
`tool_call.function.arguments`	`gen_ai.tool.call.arguments`
`tool_call.id`	`gen_ai.tool.call.id`
`reranker.model_name`	`gen_ai.request.model`
`agent.name`	`gen_ai.agent.name`
`session.id`	`gen_ai.conversation.id`
`openinference.span.kind`	`gen_ai.operation.name`	Value mapped (see below)

Source attribute	Target attribute	Notes
`llm.usage.prompt_tokens`	`gen_ai.usage.input_tokens`
`llm.usage.completion_tokens`	`gen_ai.usage.output_tokens`
`llm.request.model`	`gen_ai.request.model`
`llm.response.model`	`gen_ai.response.model`
`llm.request.max_tokens`	`gen_ai.request.max_tokens`
`llm.request.temperature`	`gen_ai.request.temperature`
`llm.request.top_p`	`gen_ai.request.top_p`
`llm.top_k`	`gen_ai.request.top_k`
`llm.frequency_penalty`	`gen_ai.request.frequency_penalty`
`llm.presence_penalty`	`gen_ai.request.presence_penalty`
`llm.chat.stop_sequences`	`gen_ai.request.stop_sequences`
`llm.request.functions`	`gen_ai.tool.definitions`
`llm.response.finish_reason`	`gen_ai.response.finish_reasons`	Wrapped as JSON array
`llm.response.stop_reason`	`gen_ai.response.finish_reasons`	Wrapped as JSON array
`llm.request.type`	`gen_ai.operation.name`	Value mapped (see below)
`traceloop.span.kind`	`gen_ai.operation.name`	Value mapped (see below)
`traceloop.entity.name`	`gen_ai.agent.name`
`traceloop.entity.input`	`gen_ai.input.messages`
`traceloop.entity.output`	`gen_ai.output.messages`

Source value	Mapped value
`LLM`	`chat`
`EMBEDDING` / `embedding`	`embeddings`
`CHAIN` / `workflow` / `task` / `AGENT` / `agent`	`invoke_agent`
`RETRIEVER` / `RERANKER` / `rerank`	`retrieval`
`TOOL` / `tool`	`execute_tool`
`PROMPT` / `completion`	`text_completion`
`chat`	`chat`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[DOC] Add documentation for agent / gen-ai trace enrichment in existing `otel_traces` processor #11976

Description

What to document

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

[DOC] Add documentation for agent / gen-ai trace enrichment in existing otel_traces processor #11976

Description

Description

What to document

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

[DOC] Add documentation for agent / gen-ai trace enrichment in existing `otel_traces` processor #11976