Open
Description
mlx_lm
has a cache_prompt
and load_prompt
feature that makes it easier to work with long prompts. When LM Studio injects an entire document into context, it may take a long time to pre-process the document. This pre-processing will be invalidated when the cache is invalidated. If users have the option to load/save the cache, this pre-processing time would be gone