Skip to content

refactor: update cache insertion method to use cache_type parameter#274

Merged
cubist38 merged 3 commits intomainfrom
refactor/align-prompt-cache-with-upstream
Apr 4, 2026
Merged

refactor: update cache insertion method to use cache_type parameter#274
cubist38 merged 3 commits intomainfrom
refactor/align-prompt-cache-with-upstream

Conversation

@cubist38
Copy link
Copy Markdown
Owner

@cubist38 cubist38 commented Apr 4, 2026

refactor: align LRUPromptCache with upstream mlx-lm implementation

Summary

  • Extract PromptTrie and PromptTrieResult as standalone classes, matching the upstream mlx_lm.models.cache structure
  • Replace checkpoint: bool with cache_type: str ("assistant", "user", "system") for priority-based eviction ordering
  • Use pop_prefixes for cleaner prefix removal when inserting trimmable caches
  • Track per-type byte usage (_n_bytes_by_type) and add stats_by_type() method
  • Switch from BFS to DFS with pruning for finding longer trie matches

cubist38 and others added 3 commits April 4, 2026 14:01
- Changed the parameter name from 'checkpoint' to 'cache_type' in MLXLMHandler and related tests for clarity.
- Introduced a new PromptTrie class for efficient token sequence management in prompt caching.
- Enhanced CacheEntry to include cache_type for better cache management.
- Simplified the cache insertion method in PromptTrie by using list comprehension for better readability.
- Added `ordering` property and `count_by_type` method to `LRUPromptCache` for improved cache type management.
- Updated references to `_ordering` to use the new property for consistency and clarity.
- Modified model path in the main execution block for updated model usage.
@cubist38 cubist38 merged commit 0b4c7b3 into main Apr 4, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant