GitHub - coral-nlp/bertblocks: A comprehensive framework for exploring transformer encoders

⚠️ BertBlocks is currently in alpha! APIs and features may change. We appreciate feedback and contributions as we work towards a stable release.

Overview

BertBlocks provides building blocks for exploring transformer encoders. It aims to be a unified, clean, well-documented, and comprehensive collection of components for BERT-like models. It is highly configurable and allows for easy experimentation with various architectural components including:

Normalization: Pre/post normalization, RMS Norm, Layer Norm, Group Norm, DeepNorm, DynamicTanhNorm, ...
Attention Mechanisms: Multi-head attention with configurable heads and dropout
Positional Encodings: ALiBi, Sinusoidal, RoPE, Relative, Learned, ...
Feed-Forward Networks: Standard MLP, Gated Linear Units (GLU)...
Activation Functions: SiLU, GELU, ReLU, ...
Optimization: Pre-configured training setup with Pytorch Lightning, variety of optimizers, training objectives, ...
Attention Backends: supports flash-, sdpa-, and eager-attention implementations for maximum flexibility, for both padded and unpadded sequences

Quick Start

Basic Usage

Train a model with the default configuration:

uv run -m bertblocks fit --config configs/pretraining.yaml

Configuration

The architecture is configurable through the BertBlocksConfig class. Key parameters include:

import bertblocks as bb

config = bb.BertBlocksConfig(
    vocab_size=30522,            # Vocabulary size
    hidden_size=768,             # Model dimension
    num_blocks=12,               # Number of transformer layers
    num_attention_heads=12,      # Number of attention heads
    norm_fn="rms",               # Normalization type
    block_pos_enc_kind="alibi",  # Positional encoding
    mlp_type="glu",              # Feed-forward architecture
    actv_fn="silu"               # Activation function
)

model = bb.BertBlocksForMaskedLM(config)

Alternatively, select Huggingface encoder architectures can be reproduced, optionally also loading their weights:

import bertblocks as bb

# Returns an equivalent BertBlocks model
model = bb.from_huggingface("answerdotai/ModernBERT-base", load_weights=True)

We are actively working on adding more verified model loaders.

If you want to add one, or make general improvements to bertblocks, have a look at our contribution guide.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Citation

If you use this code in your research, please cite:

@software{bertblocks,
  title  = {BertBlocks - Building Blocks for Exploring Transformer Encoders},
  author = {CORAL Project Contributors},
  year   = {2025},
  url    = {https://github.com/coral-nlp/bertblocks}
 }

Name		Name	Last commit message	Last commit date
Latest commit History 304 Commits
.github		.github
bertblocks		bertblocks
configs		configs
docs		docs
examples		examples
tests		tests
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
Dockerfile.dev		Dockerfile.dev
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
conftest.py		conftest.py
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Overview

Quick Start

Basic Usage

Configuration

License

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Overview

Quick Start

Basic Usage

Configuration

License

Citation

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages