skip-middle

Note

This repository accompanies the preprint Learning to Skip the Middle Layers of Transformers (https://arxiv.org/abs/2506.21103). For pre-trained models, see HuggingFace.

We based the underlying Transformer models on the reference implementation of Llama 3 (https://github.com/meta-llama/llama-models/). The key difference relative to Llama 3 is that we used the Sandwich-LN scheme (a.k.a. Peri-LN) instead of Pre-LN. The training codebase is based on the 'nanoGPT speedrun' repository (https://github.com/KellerJordan/modded-nanogpt).

Training

Download the dataset:

uv run data/download_fineweb_10B_gpt2.py

Train a model:

python -m projects.skip_middle.train_fineweb ...
python -m torch.distributed.run --standalone --nproc_per_node 4 projects/skip_middle/train_fineweb.py ...

See help.txt for command-line arguments or config.py for configuration classes.

Installation

Install uv:

curl -LsSf https://astral.sh/uv/install.sh | sh

Create a virtual environment:

uv venv
source .venv/bin/activate

Install packages:

uv pip install -e .

Install PyTorch:

UV_TORCH_BACKEND=auto uv pip install torch

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.vscode		.vscode
data		data
lab		lab
projects		projects
.gitignore		.gitignore
.python-version		.python-version
LICENSE		LICENSE
README.md		README.md
help.txt		help.txt
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

skip-middle

Training

Installation

About

Uh oh!

Releases

Packages

Languages

License

tim-lawson/skip-middle

Folders and files

Latest commit

History

Repository files navigation

skip-middle

Training

Installation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages