-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Open
Labels
Description
Required prerequisites
- I have searched the Issue Tracker and Discussions that this hasn't already been reported. (+1 or comment there if it has.)
- Consider asking first in a Discussion.
Motivation
One of the most effective and easy ways to improve LLM API call efficiency is to do prompt caching.
Popular LLM APIs, such as OpenAI, Anthropic, and Google, provide prompt caching features. With this feature, the token usage and latency are significantly reduced.
Solution
Popular LLM APIs, such as OpenAI, Anthropic, and Google, provide prompt caching features. With this feature, the token usage and latency are significantly reduced.
To maximize the benefit of prompt caching, we can optimize the prompt templates. Static content should be placed at the beginning of the prompt, and dynamic content should be placed at the end.
Alternatives
No response
Additional context
No response