Skip to content

🚀 Feature: Add prompt caching for Bedrock Converse #3337

@MikeQDev

Description

@MikeQDev

Which component is this feature for?

Bedrock Instrumentation

🔖 Feature description

Support tracking of prompt caching in Bedrock Converse.

Prompt caching telemetry for Bedrock Converse will be:

  • Added to spans as Attributes
  • Emitted to the prompt_caching Counter (or should this be a histogram?)

🎤 Why is this feature needed ?

Better cost and latency tracking.

Feature parity with #2788, which instruments _handle_call and _handle_call_stream, but not Converse functions: _handle_converse or _handle_converse_stream within __init__.py

✌️ How do you aim to achieve this?

Update prompt_caching.py

(Has prompt_caching_handling but doesn't have prompt_caching_converse)

Create function prompt_caching_converse that takes in: response, vendor, model, metric_params

Taking in response instead of headers, since I'm not seeing any of the headers that are specified in the existing Instrumentation ( x-amzn-bedrock-cache-{read,write}-input-token-count) in the responses from Bedrock. What I see in responses for Bedrock Converse is within the response body (?) in a usage_metadata field with value {'input_tokens': 3, 'output_tokens': 492, 'total_tokens': 6042, 'input_token_details': {'cache_creation': 5547, 'cache_read': 0}}.

Example response from `llm.invoke(input=messages)`
content="# OpenTelemetry Logging - Summary\n\n## Introduction and Philosophy\n- OpenTelemetry's approach to logs differs from its approach to metrics and traces\n- Instead of creating entirely new logging systems, OpenTelemetry embraces existing logging solutions\n- Focus is on integrating logs with other observability signals (traces and metrics)\n\n## Key Problems Solved\n- Traditional logging solutions lack standardized integration with traces and metrics\n- No standardized way to include origin/source information in logs\n- Logs often lack context propagation in distributed systems\n- Different collection agents and protocols create fragmented observability data\n\n## OpenTelemetry's Solution\n- Defines a standard log data model for consistent representation\n- Enables correlation between logs, traces, and metrics\n- Supports existing log formats through mapping to the OpenTelemetry model\n- Provides a Logs API for emitting LogRecords\n- Offers SDK implementation for processing and exporting logs\n\n## Log Correlation Dimensions\n- **Time of execution**: Basic correlation by timestamp\n- **Execution context**: Including trace and span IDs in logs\n- **Origin of telemetry**: Including resource context in logs\n\n## Log Sources and Collection Approaches\n- **System Logs**: OS-generated logs that can be enriched with resource context\n- **Infrastructure Logs**: From components like Kubernetes, can be enriched with resource context\n- **Third-party Application Logs**: Various formats that can be parsed and enriched\n- **Legacy First-Party Applications**:\n  - Via File/Stdout: Collected using file log receivers or agents\n  - Direct to Collector: Modified to output via network protocols like OTLP\n- **New First-Party Applications**: Can fully implement OpenTelemetry's logging approach\n\n## OpenTelemetry Collector Features\n- Support for log data types and pipelines\n- Ability to read and tail log files\n- Log parsing capabilities for common formats\n- Network protocol support for receiving and sending logs\n- Enrichment processors for adding context\n\n## Auto-Instrumentation Capabilities\n- Can configure popular logging libraries to include trace context\n- Reads incoming trace context\n- Includes trace and span IDs in logged statements\n- Can optionally send logs directly via OTLP\n\n## Key Components\n- Log Data Model: Common understanding of what a LogRecord is\n- Logs API: For emitting LogRecords\n- SDK: Implementation enabling configuration of processing and exporting\n- Collector: For collecting, processing, and exporting logs" additional_kwargs={} response_metadata={'ResponseMetadata': {'RequestId': '98284356-db8f-446e-b078-81eec8a99fc5', 'HTTPStatusCode': 200, 'HTTPHeaders': {'date': 'Fri, 22 Aug 2025 14:05:38 GMT', 'content-type': 'application/json', 'content-length': '2871', 'x-amzn-requestid': '98284356-db8f-446e-b078-81eec8a99fc5', 'cache-control': 'proxy-revalidate', 'connection': 'keep-alive'}, 'RetryAttempts': 0}, 'stopReason': 'end_turn', 'metrics': {'latencyMs': [15525]}, 'model_name': 'us.anthropic.claude-3-7-sonnet-20250219-v1:0'} id='run--62f6d50c-4d6a-4ad5-a8f0-bff5e8155cba-0' usage_metadata={'input_tokens': 3, 'output_tokens': 553, 'total_tokens': 6103, 'input_token_details': {'cache_creation': 0, 'cache_read': 5547}}

Update __init__.py

Import the new prompt_caching_converse function

Call the prompt_caching_converse function within the following functions:

  • _handle_converse
  • _handle_converse_stream

🔄️ Additional Information

I'm not a Python developer, but am willing to give this feature an attempt.

Related:

👀 Have you spent some time to check if this feature request has been raised before?

  • I checked and didn't find similar issue

Are you willing to submit PR?

Yes I am willing to submit a PR!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions