Skip to content

Structure completion request to maximize Prompt Caching #42805

Open
@brandonh-msft

Description

@brandonh-msft

Today, the current flow of a request through to an OpenAI service relies on simple JSON-serialization of a model to encode the message to BinaryData and send it through the pipeline.

This does not maximize Prompt Caching capabilities, where the completion request should have tools, then history, then new content - in that order.
Additionally, the tools and history must be in the same order every time (suggest alpha order by tool name).

Sources:
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching
https://openai.com/index/api-prompt-caching/
https://learn.microsoft.com/en-us/azure/ai-services/openai/how-to/prompt-caching#what-is-cached

Asks for BinaryData from the options:

return getChatCompletionsWithResponse(deploymentOrModelName, BinaryData.fromObject(chatCompletionsOptions),

Which simply uses a default serialization implementation to turn the CompletionChatOptions into BinaryData

public static BinaryData fromObject(Object data) {
return fromObject(data, SERIALIZER);

static final JsonSerializer SERIALIZER = JsonSerializerProviders.createInstance(true);

Additional context

microsoft/semantic-kernel#9444
openai/openai-dotnet#281

Metadata

Metadata

Assignees

Labels

ClientThis issue points to a problem in the data-plane of the library.OpenAI

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions