⚠️ BertBlocks is currently in alpha! APIs and features may change. We appreciate feedback and contributions as we work towards a stable release.
BertBlocks provides building blocks for exploring transformer encoders. It aims to be a unified, clean, well-documented, and comprehensive collection of components for BERT-like models. It is highly configurable and allows for easy experimentation with various architectural components including:
- Normalization: Pre/post normalization, RMS Norm, Layer Norm, Group Norm, DeepNorm, DynamicTanhNorm, ...
- Attention Mechanisms: Multi-head attention with configurable heads and dropout
- Positional Encodings: ALiBi, Sinusoidal, RoPE, Relative, Learned, ...
- Feed-Forward Networks: Standard MLP, Gated Linear Units (GLU)...
- Activation Functions: SiLU, GELU, ReLU, ...
- Optimization: Pre-configured training setup with Pytorch Lightning, variety of optimizers, training objectives, ...
- Attention Backends: supports flash-, sdpa-, and eager-attention implementations for maximum flexibility, for both padded and unpadded sequences
Train a model with the default configuration:
uv run -m bertblocks fit --config configs/pretraining.yamlThe architecture is configurable through the BertBlocksConfig class. Key parameters include:
import bertblocks as bb
config = bb.BertBlocksConfig(
vocab_size=30522, # Vocabulary size
hidden_size=768, # Model dimension
num_blocks=12, # Number of transformer layers
num_attention_heads=12, # Number of attention heads
norm_fn="rms", # Normalization type
block_pos_enc_kind="alibi", # Positional encoding
mlp_type="glu", # Feed-forward architecture
actv_fn="silu" # Activation function
)
model = bb.BertBlocksForMaskedLM(config)Alternatively, select Huggingface encoder architectures can be reproduced, optionally also loading their weights:
import bertblocks as bb
# Returns an equivalent BertBlocks model
model = bb.from_huggingface("answerdotai/ModernBERT-base", load_weights=True)We are actively working on adding more verified model loaders.
If you want to add one, or make general improvements to bertblocks, have a look at our contribution guide.
This project is licensed under the MIT License - see the LICENSE file for details.
If you use this code in your research, please cite:
@software{bertblocks,
title = {BertBlocks - Building Blocks for Exploring Transformer Encoders},
author = {CORAL Project Contributors},
year = {2025},
url = {https://github.com/coral-nlp/bertblocks}
}