Description
Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData
and send it through the pipeline.
This does not maximize Prompt Caching capabilities, where the completion request should have tools
, then history
, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).
Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached
Asks for BinaryData
from the options:
Which simply uses a default serialization implementation to turn the CompletionChatOptions into BinaryData