Is your feature request related to a problem? Please describe.
LlmSpan (and update_llm_span / @observe(type="llm")) only supports four cost-related
fields: input_token_count, output_token_count, cost_per_input_token,
cost_per_output_token. There is no way to report cached input tokens, even though
every major provider now bills them at a discounted rate:
- OpenAI / Azure OpenAI:
prompt_tokens_details.cached_tokens (inclusive a subset of
prompt_tokens), billed at ~50% of input price (up to 100% off on provisioned
deployments).
- Anthropic:
cache_read_input_tokens (exclusive of input_tokens), billed at ~10% of
input price.
Because cost is computed as input_token_count × cost_per_input_token, any application
with a decent cache hit rate sees its cost systematically overestimated on Confident
AI and there is no workaround that keeps both token counts and cost exact (a weighted
effective cost_per_input_token keeps cost exact but shows a misleading per-token rate).
Your own codebase has already hit this limitation: the OpenAI Agents integration
(deepeval/openai_agents/extractors.py) extracts usage.input_tokens_details.cached_tokens
from the SDK response, but has to stash it in the generic span.metadata dict because
LlmSpan has no first-class field for it. So it is invisible to cost/token analytics.
The OpenTelemetry exporter has the same gap: the confident.llm.* namespace exposes the
same four fields, and GenAI semconv cache attributes (e.g.
gen_ai.usage.cache_read.input_tokens as emitted by OpenLLMetry) are not mapped.
Notably, Confident AI's own AI agent observability playbook
recommends capturing "Tokens by category (prompt, completion, cached, reasoning),
estimated dollar cost". The SDK currently can't express that schema.
Describe the solution you'd like
- New optional fields on
LlmSpan (+ update_llm_span params):
cached_input_token_count and cost_per_cached_input_token.
- Server-side cost formula when present:
(input − cached) × cost_per_input + cached × cost_per_cached + output × cost_per_output
(assuming inclusive counting; Anthropic's exclusive counting would need either
normalization at ingestion or a documented convention).
- A cached-input price column in Settings → Model Costs, so automatic cost inference
benefits too.
- OTel exporter: map the GenAI semconv cached-token attributes to the new field.
- The OpenAI Agents extractor could then promote
cached_input_tokens from metadata
to the first-class field.
I'm happy to contribute the SDK side of this (types, update_llm_span, the
_convert_span_to_api_span mapping, OTel exporter, extractors) as a PR if the
/v1/traces schema and cost computation can be extended to accept it.
Describe alternatives you've considered
- Passing a weighted effective
cost_per_input_token per call (exact cost, misleading
unit rate, requires maintaining a provider price table client-side).
- Recording cached tokens as span
metadata (what the OpenAI Agents integration does
today. Visible on the trace but excluded from cost/token analytics).
Additional context
Verified against deepeval 4.0.4 and current main (deepeval/tracing/types.py,
deepeval/tracing/api.py, deepeval/tracing/otel/exporter.py). Use case: a WhatsApp
customer-support agent on Azure OpenAI where system prompt + tool schemas dominate input
and cache hit rates are high, making the dashboard cost consistently inflated.
Is your feature request related to a problem? Please describe.
LlmSpan(andupdate_llm_span/@observe(type="llm")) only supports four cost-relatedfields:
input_token_count,output_token_count,cost_per_input_token,cost_per_output_token. There is no way to report cached input tokens, even thoughevery major provider now bills them at a discounted rate:
prompt_tokens_details.cached_tokens(inclusive a subset ofprompt_tokens), billed at ~50% of input price (up to 100% off on provisioneddeployments).
cache_read_input_tokens(exclusive ofinput_tokens), billed at ~10% ofinput price.
Because cost is computed as
input_token_count × cost_per_input_token, any applicationwith a decent cache hit rate sees its cost systematically overestimated on Confident
AI and there is no workaround that keeps both token counts and cost exact (a weighted
effective
cost_per_input_tokenkeeps cost exact but shows a misleading per-token rate).Your own codebase has already hit this limitation: the OpenAI Agents integration
(
deepeval/openai_agents/extractors.py) extractsusage.input_tokens_details.cached_tokensfrom the SDK response, but has to stash it in the generic
span.metadatadict becauseLlmSpanhas no first-class field for it. So it is invisible to cost/token analytics.The OpenTelemetry exporter has the same gap: the
confident.llm.*namespace exposes thesame four fields, and GenAI semconv cache attributes (e.g.
gen_ai.usage.cache_read.input_tokensas emitted by OpenLLMetry) are not mapped.Notably, Confident AI's own AI agent observability playbook
recommends capturing "Tokens by category (prompt, completion, cached, reasoning),
estimated dollar cost". The SDK currently can't express that schema.
Describe the solution you'd like
LlmSpan(+update_llm_spanparams):cached_input_token_countandcost_per_cached_input_token.(input − cached) × cost_per_input + cached × cost_per_cached + output × cost_per_output(assuming inclusive counting; Anthropic's exclusive counting would need either
normalization at ingestion or a documented convention).
benefits too.
cached_input_tokensfrommetadatato the first-class field.
I'm happy to contribute the SDK side of this (types,
update_llm_span, the_convert_span_to_api_spanmapping, OTel exporter, extractors) as a PR if the/v1/tracesschema and cost computation can be extended to accept it.Describe alternatives you've considered
cost_per_input_tokenper call (exact cost, misleadingunit rate, requires maintaining a provider price table client-side).
metadata(what the OpenAI Agents integration doestoday. Visible on the trace but excluded from cost/token analytics).
Additional context
Verified against deepeval 4.0.4 and current
main(deepeval/tracing/types.py,deepeval/tracing/api.py,deepeval/tracing/otel/exporter.py). Use case: a WhatsAppcustomer-support agent on Azure OpenAI where system prompt + tool schemas dominate input
and cache hit rates are high, making the dashboard cost consistently inflated.