Skip to content

[Feature Request] Support prompt caching #3586

@Zephyroam

Description

@Zephyroam

Required prerequisites

Motivation

One of the most effective and easy ways to improve LLM API call efficiency is to do prompt caching.

Popular LLM APIs, such as OpenAI, Anthropic, and Google, provide prompt caching features. With this feature, the token usage and latency are significantly reduced.

Solution

Popular LLM APIs, such as OpenAI, Anthropic, and Google, provide prompt caching features. With this feature, the token usage and latency are significantly reduced.

To maximize the benefit of prompt caching, we can optimize the prompt templates. Static content should be placed at the beginning of the prompt, and dynamic content should be placed at the end.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions