Skip to content

Olmoearth dynamic token skipping#511

Draft
Hgherzog wants to merge 2 commits intomainfrom
cursor/olmoearth-dynamic-token-skipping-0ec9
Draft

Olmoearth dynamic token skipping#511
Hgherzog wants to merge 2 commits intomainfrom
cursor/olmoearth-dynamic-token-skipping-0ec9

Conversation

@Hgherzog
Copy link
Collaborator

Implement NOBLE (Nonlinear Low-Rank Branches) to accelerate transformer pretraining, based on arXiv:2603.06492.

Open in Web Open in Cursor 

cursoragent and others added 2 commits March 13, 2026 23:00
Implements NOBLE from arXiv:2603.06492 which augments transformer linear layers
with nonlinear low-rank branches that can accelerate training by up to 1.47x.

Key changes:
- Add noble.py with CosNet activation, NobleBranch, NobleLinear, and NobleConfig
- Update attention.py to support optional NOBLE branches on Q/K/V/proj and MLP layers
- Update flexi_vit.py to pass noble_config through FlexiVitBase, Encoder, Predictor
- Add EncoderConfig.noble_config and PredictorConfig.noble_config
- Add scripts/official/noble.py experiment script for NOBLE training
- Add comprehensive unit tests in tests/unit/nn/test_noble.py

NOBLE computes: output = xW + σ(xW_down)W_up
where σ is CosNet (two-layer cosine nonlinearity with learnable frequency/phase).

Paper reports:
- Up to 1.47x step speedup to reach baseline eval loss
- ~4% additional parameters (at scale) with 7% step time overhead
- Up to 1.22x net wallclock speedup

Note: Paper found Mixup/CutMix interferes with NOBLE benefits.

Co-authored-by: henryh <henryh@allenai.org>
Co-authored-by: henryh <henryh@allenai.org>
@cursor
Copy link

cursor bot commented Mar 13, 2026

Cursor Agent can help with this pull request. Just @cursor in comments and I'll start working on changes in this branch.
Learn more about Cursor Agents

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants