Skip to content

fix: use OrderedDict for proper LRU cache eviction in fp8_cast_bf16.py#1128

Open
modimihir07 wants to merge 2 commits intodeepseek-ai:mainfrom
modimihir07:fix/lru-cache-eviction-fp8-cast
Open

fix: use OrderedDict for proper LRU cache eviction in fp8_cast_bf16.py#1128
modimihir07 wants to merge 2 commits intodeepseek-ai:mainfrom
modimihir07:fix/lru-cache-eviction-fp8-cast

Conversation

@modimihir07
Copy link
Copy Markdown

@modimihir07 modimihir07 commented Mar 1, 2026

Problem

The loaded_files cache in fp8_cast_bf16.py uses a plain Python dict with
FIFO eviction (next(iter(loaded_files))). The comment on line 90 says
"keep only the 2 most recently used files", but the implementation is actually
FIFO (first-in-first-out), not LRU (least-recently-used).

When get_tensor() accesses a cached file (e.g., to fetch a scale_inv tensor
that lives in a different shard than its weight), that access does NOT update
the file's position in the eviction order. This means:

  1. File A is loaded (contains weight)
  2. File B is loaded (contains scale_inv for a weight in File A)
  3. File C is loaded → eviction removes File A (FIFO order)
  4. Another weight needs its scale from File A → reload File A, evicting File B
  5. This cascades into O(n²) redundant file reloads

Fix

  • Replace dict with collections.OrderedDict
  • Call move_to_end() on cache hits in get_tensor() to mark as recently used
  • Call move_to_end() when loading current file in main loop
  • Use popitem(last=False) for proper LRU eviction
  • Change if len > 2 to while len > 2 to handle edge cases

Testing

  • python -m py_compile inference/fp8_cast_bf16.py — passes
  • Change is backward compatible (no API changes)
  • OrderedDict is from Python stdlib, no new dependencies

@modimihir07 modimihir07 changed the title fix: use OrderedDict for proper LRU cache eviction in fp8_cast_bf16.py & fix: add repetition penalty to mitigate multi-turn repetition (fixes #1125) fix: use OrderedDict for proper LRU cache eviction in fp8_cast_bf16.py Mar 1, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant