Fix RoPE positions to reset at document boundaries when using doc_lens #591

finbarrtimbers · 2026-02-03T21:14:36Z

When packing multiple documents with cu_doc_lens, RoPE was using global positions instead of per-document positions. This caused the second document to get positions [20, 21, ...] instead of [0, 1, ...], breaking the expected behavior where identical documents should produce identical outputs.

To fix, we add a cu_doc_lens parameter to RotaryEmbedding.forward() and ComplexRotaryEmbedding.forward() which computes per-document positions using searchsorted to find document boundaries. We have a minimal repro which we ran on Beaker.

Before this fix, identical documents packed with doc_lens produced different outputs because RoPE applied global positions (see minimal_doc_lens_repro.py):

Sample logits at position 5:
  Packed doc0[5, :5]:  [0.365234375, 0.0888671875, -0.5078125, 0.54296875, 0.9140625]
  Packed doc1[5, :5]:  [0.427734375, 0.1171875, -0.4921875, 0.5234375, 0.8671875]

After our fix, identical documents packed with doc_lens now produce identical outputs:

Sample logits at position 5:
  Packed doc0[5, :5]:  [0.365234375, 0.0888671875, -0.5078125, 0.54296875, 0.9140625]
  Packed doc1[5, :5]:  [0.365234375, 0.0888671875, -0.5078125, 0.54296875, 0.9140625]

This script demonstrates that when using doc_lens for intra-document masking, RoPE positions are NOT reset per document. When packing two identical sequences [seq | seq] with doc_lens=[10, 10]: - Doc1 gets RoPE positions [0, 1, ..., 9] (correct) - Doc2 gets RoPE positions [10, 11, ..., 19] (incorrect, should be [0, 1, ..., 9]) This causes the second document's logits to differ from what they would be if processed separately, which affects use cases like DPO training where chosen and rejected sequences are packed together. To run: uv run python src/scripts/doc_lens_rope_issue.py Requires: CUDA and flash attention

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The TransformerBlockConfig was migrated from block.attention to block.sequence_mixer. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

When packing multiple documents with cu_doc_lens, RoPE was using global positions instead of per-document positions. This caused the second document to get positions [20, 21, ...] instead of [0, 1, ...], breaking the expected behavior where identical documents should produce identical outputs. The fix adds cu_doc_lens parameter to RotaryEmbedding.forward() which computes per-document positions using searchsorted to find document boundaries. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…cript - Add cu_doc_lens parameter to ComplexRotaryEmbedding.forward() for per-document position computation - Remove src/scripts/doc_lens_rope_issue.py which had lint/type errors Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: df72f4eeb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

src/olmo_core/nn/rope.py

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

tyler-romero · 2026-02-03T21:56:53Z

Nice!

Related: #503

- Add test_rope_cu_doc_lens_resets_positions: verifies positions reset at document boundaries for B=1 - Add test_rope_cu_doc_lens_batch_gt_1: verifies B>1 with same doc structure per batch element - Add test_rope_cu_doc_lens_uneven_docs: verifies B>1 with different doc structures per batch element (e.g., batch0 has docs of len 3+5, batch1 has docs of len 2+6) - Tests cover both RotaryEmbedding and ComplexRotaryEmbedding classes - Tests cover both head_first=True and head_first=False modes - Fix B>1 support in RotaryEmbedding and ComplexRotaryEmbedding by computing global positions across the flattened [B*T] space before searchsorted Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Extract compute_local_positions() helper function to compute per-document positions from cu_doc_lens, used by both RotaryEmbedding and ComplexRotaryEmbedding - Keep original bracket notation for tensor unsqueezing (e.g., [:, None, :, :]) for clarity Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 50ab95eca1

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-03T23:57:37Z

src/olmo_core/nn/rope.py

+    batch_offsets = torch.arange(batch_size, device=device) * k_len
+    positions = torch.arange(k_len, device=device)
+    global_positions = batch_offsets[:, None] + positions[None, :]


Account for start_pos when building local positions

When cu_doc_lens is provided, local positions are derived from positions = arange(k_len) and never offset by start_pos. In KV-cache decoding (start_pos > 0), this makes the newest token always appear at local position 0 (or near the doc start) instead of its true absolute position within the document, so RoPE repeats earlier positions and yields incorrect attention for continued generation. This only affects runs that combine cu_doc_lens with decoding/start_pos (e.g., cached autoregressive generation) and does not occur for full-sequence training where start_pos is None.

Useful? React with 👍 / 👎.

finbarrtimbers and others added 5 commits February 3, 2026 09:10

Add updated minimal doc_lens reproduction script

18418fc

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fix minimal doc_lens repro to use sequence_mixer API

98eeb25

The TransformerBlockConfig was migrated from block.attention to block.sequence_mixer. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

chatgpt-codex-connector bot reviewed Feb 3, 2026

View reviewed changes

src/olmo_core/nn/rope.py Outdated Show resolved Hide resolved

finbarrtimbers and others added 2 commits February 3, 2026 14:20

Update CHANGELOG for RoPE doc_lens fix

c7ecc59

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Cleaned up code

a4ba1cd

finbarrtimbers marked this pull request as draft February 3, 2026 21:45

finbarrtimbers force-pushed the finbarr/doc-lens-rope-repro branch from 6aae7d5 to 3cb1939 Compare February 3, 2026 22:56

finbarrtimbers force-pushed the finbarr/doc-lens-rope-repro branch from 3cb1939 to 8650c5a Compare February 3, 2026 22:58

finbarrtimbers and others added 2 commits February 3, 2026 16:21

added error message

51498c2

Fix black formatting in rope.py

cc1c7b1

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

finbarrtimbers marked this pull request as ready for review February 3, 2026 23:54

Merge branch 'main' into finbarr/doc-lens-rope-repro

50ab95e

finbarrtimbers requested a review from epwalsh February 3, 2026 23:55

chatgpt-codex-connector bot reviewed Feb 3, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix RoPE positions to reset at document boundaries when using doc_lens #591

Fix RoPE positions to reset at document boundaries when using doc_lens #591

Uh oh!

finbarrtimbers commented Feb 3, 2026 •

edited

Loading

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

tyler-romero commented Feb 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Feb 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Fix RoPE positions to reset at document boundaries when using doc_lens #591

Are you sure you want to change the base?

Fix RoPE positions to reset at document boundaries when using doc_lens #591

Uh oh!

Conversation

finbarrtimbers commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

tyler-romero commented Feb 3, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 3, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

finbarrtimbers commented Feb 3, 2026 •

edited

Loading