Fix rope_cache_length computation for models with rope_local_base_freq #2179

adi776borate · 2026-01-04T19:04:45Z

What does this PR do?

The root cause of the bug:

In litgpt/generate/sequentially.py line 65, the code computes rope_cache_length as:
model.cos.size(-1)
For models with rope_local_base_freq, the RoPE cache has shape (seq_len, n_elem, 2) instead of (seq_len, n_elem).
Using .size(-1) returns 2 instead of the correct n_elem (e.g., 128), causing the KV cache to be initialized with incorrect head dimensions.

Solution:

In litgpt/generate/sequentially.py:

# Before (buggy):
submodule.attn.kv_cache = submodule.attn.build_kv_cache(
    1, max_seq_length, model.cos.size(-1), target_device
)

# After (fixed):
if len(model.cos.shape) == 2:
    rope_cache_length = model.cos.size(-1)
elif len(model.cos.shape) == 3:
    rope_cache_length = model.cos.size(1)  # Get n_elem dimension
else:
    rope_cache_length = model.cos.size(-1)

submodule.attn.kv_cache = submodule.attn.build_kv_cache(
    1, max_seq_length, rope_cache_length, target_device
)

Testing:

from pathlib import Path
from litgpt.api import LLM

# Path to downloaded checkpoint
checkpoint_dir = Path("checkpoints/google/gemma-3-1b-it")

llm = LLM.load(str(checkpoint_dir), distribute=None)

llm.distribute(
    devices=1,
    accelerator="cuda",
    generate_strategy="sequential",
    fixed_kv_cache_size=2048,
)

output = llm.generate("What do llamas eat?", max_new_tokens=2000)
print(f"\nOutput: {output}")

Above script produces the expected output when the fix is applied.

Who can review?

Anyone in the community is free to review the PR once the tests have passed.

for more information, see https://pre-commit.ci

bhimrazy

Thanks @adi776borate, nice catch.

I also refactored the logic abit to a rope_cache_length method and also added a test.

Fix rope_cache_length computation for models with rope_local_base_freq

a4514f6

adi776borate requested review from KaelanDt, andyland, k223kim, lantiga, lianakoleva and t-vi as code owners January 4, 2026 19:04

pre-commit-ci bot and others added 2 commits January 4, 2026 19:04

[pre-commit.ci] auto fixes from pre-commit.com hooks

9ea351b

for more information, see https://pre-commit.ci

fix: refactor rope_cache_length computation and add unit test

814f44c

bhimrazy approved these changes Jan 7, 2026

View reviewed changes

tchaton approved these changes Jan 8, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix rope_cache_length computation for models with rope_local_base_freq #2179

Fix rope_cache_length computation for models with rope_local_base_freq #2179

Uh oh!

adi776borate commented Jan 4, 2026

Uh oh!

bhimrazy left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix rope_cache_length computation for models with rope_local_base_freq #2179

Are you sure you want to change the base?

Fix rope_cache_length computation for models with rope_local_base_freq #2179

Uh oh!

Conversation

adi776borate commented Jan 4, 2026

What does this PR do?

The root cause of the bug:

Solution:

Testing:

Who can review?

Uh oh!

bhimrazy left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants