Skip to content

feat: Add configurable prompt caching TTL (5m / 1h) via environment variable #224

@lobando

Description

@lobando

Summary

Amazon Bedrock added support for a 1-hour TTL option for prompt caching on January 26, 2026 (AWS announcement). Currently, the gateway uses the default 5-minute TTL with no way to configure it. This feature request proposes adding a PROMPT_CACHE_TTL environment variable to allow users to opt into the extended 1-hour cache.

Motivation

For long-running agentic sessions or applications with large, stable system prompts, the 5-minute TTL expires frequently during idle gaps — triggering unnecessary cacheWrite charges. A 1-hour TTL would significantly reduce costs for these workloads.

Proposed Change

  • Add PROMPT_CACHE_TTL to src/api/setting.py with default "5m" and valid values "5m" or "1h"
  • Pass the TTL field through to the cache_control block in Bedrock requests
  • Restrict 1-hour TTL to supported models only (Claude Haiku 4.5, Sonnet 4.5, Opus 4.5)
  • Document the new variable in README alongside ENABLE_PROMPT_CACHING

Supported Models (1h TTL)

Per AWS docs:

  • claude-haiku-4-5
  • claude-sonnet-4-5
  • claude-opus-4-5

No Breaking Changes

Default remains "5m" — existing deployments are unaffected.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions