Feature: support cached input tokens in LLM span cost tracking

**Is your feature request related to a problem? Please describe.**

`LlmSpan` (and `update_llm_span` / `@observe(type="llm")`) only supports four cost-related
fields: `input_token_count`, `output_token_count`, `cost_per_input_token`,
`cost_per_output_token`. There is no way to report **cached input tokens**, even though
every major provider now bills them at a discounted rate:

- OpenAI / Azure OpenAI: `prompt_tokens_details.cached_tokens` (inclusive a subset of
  `prompt_tokens`), billed at ~50% of input price (up to 100% off on provisioned
  deployments).
- Anthropic: `cache_read_input_tokens` (exclusive of `input_tokens`), billed at ~10% of
  input price.

Because cost is computed as `input_token_count × cost_per_input_token`, any application
with a decent cache hit rate sees its cost **systematically overestimated** on Confident
AI and there is no workaround that keeps both token counts and cost exact (a weighted
effective `cost_per_input_token` keeps cost exact but shows a misleading per-token rate).

Your own codebase has already hit this limitation: the OpenAI Agents integration
(`deepeval/openai_agents/extractors.py`) extracts `usage.input_tokens_details.cached_tokens`
from the SDK response, but has to stash it in the generic `span.metadata` dict because
`LlmSpan` has no first-class field for it. So it is invisible to cost/token analytics.

The OpenTelemetry exporter has the same gap: the `confident.llm.*` namespace exposes the
same four fields, and GenAI semconv cache attributes (e.g.
`gen_ai.usage.cache_read.input_tokens` as emitted by OpenLLMetry) are not mapped.

Notably, Confident AI's own [AI agent observability playbook](https://www.confident-ai.com/knowledge-base/playbook/ai-agent-observability)
recommends capturing "Tokens by category (prompt, completion, **cached**, reasoning),
estimated dollar cost". The SDK currently can't express that schema.

**Describe the solution you'd like**

1. New optional fields on `LlmSpan` (+ `update_llm_span` params):
   `cached_input_token_count` and `cost_per_cached_input_token`.
2. Server-side cost formula when present:
   `(input − cached) × cost_per_input + cached × cost_per_cached + output × cost_per_output`
   (assuming inclusive counting; Anthropic's exclusive counting would need either
   normalization at ingestion or a documented convention).
3. A cached-input price column in Settings → Model Costs, so automatic cost inference
   benefits too.
4. OTel exporter: map the GenAI semconv cached-token attributes to the new field.
5. The OpenAI Agents extractor could then promote `cached_input_tokens` from `metadata`
   to the first-class field.

I'm happy to contribute the SDK side of this (types, `update_llm_span`, the
`_convert_span_to_api_span` mapping, OTel exporter, extractors) as a PR if the
`/v1/traces` schema and cost computation can be extended to accept it.

**Describe alternatives you've considered**

- Passing a weighted effective `cost_per_input_token` per call (exact cost, misleading
  unit rate, requires maintaining a provider price table client-side).
- Recording cached tokens as span `metadata` (what the OpenAI Agents integration does
  today. Visible on the trace but excluded from cost/token analytics).

**Additional context**

Verified against deepeval 4.0.4 and current `main` (`deepeval/tracing/types.py`,
`deepeval/tracing/api.py`, `deepeval/tracing/otel/exporter.py`). Use case: a WhatsApp
customer-support agent on Azure OpenAI where system prompt + tool schemas dominate input
and cache hit rates are high, making the dashboard cost consistently inflated.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: support cached input tokens in LLM span cost tracking #2741

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Feature: support cached input tokens in LLM span cost tracking #2741

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions