Skip to content

Dynamic maybeQuantizeKVCache causes cache COW in TokenIterator. #341

@mzbac

Description

@mzbac

When I work on the prompt cache with the KV quant cache, I've noticed maybeQuantizeKVCache converts the SimpleKVCache to the quantized KV cache. However, the cache reference passed to TokenIterator still remains as SimpleKVCache. It seems this causes a COW situation where TokenIterator starts maintaining its own cache instead of using the reference from the passed-in cache. This cause an issue where external code can't access the updated KV cache from TokenIterator.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions