Skip to content

feat: Add configurable prompt caching TTL (5m / 1h) via environment variable #224

@lobando

Description

@lobando

Summary

Amazon Bedrock added support for a 1-hour TTL option for prompt caching on January 26, 2026 (AWS announcement). Currently, the gateway uses the default 5-minute TTL with no way to configure it. This feature request proposes adding a PROMPT_CACHE_TTL environment variable to allow users to opt into the extended 1-hour cache.

Motivation

For long-running agentic sessions or applications with large, stable system prompts, the 5-minute TTL expires frequently during idle gaps — triggering unnecessary cacheWrite charges. A 1-hour TTL would significantly reduce costs for these workloads.

Proposed Change

  • Add PROMPT_CACHE_TTL to src/api/setting.py with default "5m" and valid values "5m" or "1h"
  • Pass the TTL field through to the cache_control block in Bedrock requests
  • Restrict 1-hour TTL to supported models only (Claude Haiku 4.5, Sonnet 4.5, Opus 4.5)
  • Document the new variable in README alongside ENABLE_PROMPT_CACHING

Supported Models (1h TTL)

Per AWS docs:

  • claude-haiku-4-5
  • claude-sonnet-4-5
  • claude-opus-4-5

No Breaking Changes

Default remains "5m" — existing deployments are unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions