Skip to content

Feature: Prompt cache #310

Open
Open
@jolonf

Description

@jolonf

Currently MLXLMCommon has some basic support for a cache, however it isn't persisted across calls to generate().

Even though it appears there could be a way to pass a KVCache to generate(), it ultimately must pass through the Sendable boundary if the app is to manage the cache. This isn't possible as MLXArray is not Sendable and also isn't desirable or necessary.

A prompt cache could be managed by the ModelContainer actor and stored in its context ModelContext.promptCache. Note that the prompt cache is an array of KVCache. In mlx_lm the PromptCache object also stores the token ids of the cached prompt and the model key to check if the model has changed.

We could implement a similar struct:

public struct PromptCache {
    public let cache: [KVCache]
    public let modelKey: String
    public let tokens: MLXArray
}

The PromptCache struct could also have functions for trimming.

Functions analogous to mlx_lm's get_prompt_cache could go in the ModelContainer actor.

I'm currently having a go at implementing this. Interested in any suggestions on the best approach.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions