A unified PyTorch library providing easy access to state-of-the-art Linear RNN architectures for sequence modeling. The technical report of this system was accepted to EACL Student Research Workshop 2026. We recommend reading the report before using / contributing to the library.
# standard installation
pip install lrnnx
# with optional causal-conv1d
pip install "lrnnx[causal-conv1d]"
# for development
pip install "lrnnx[dev]"We recommend installing PyTorch first, matching your specific CUDA version. After that, install our library using --no-build-isolation.
pip install lrnnx --no-build-isolationWe recommend installation with uv, though standard pip is also supported.
git clone https://github.com/SforAiDl/lrnnx.git
cd lrnnx
# standard installation
uv sync
# with optional causal-conv1d
uv sync --extra causal-conv1d
# for development
uv sync --extra devgit clone https://github.com/SforAiDl/lrnnx.git
cd lrnnx
# standard installation
pip install -e . --no-build-isolation
# with optional causal-conv1d
pip install -e ".[causal-conv1d]" --no-build-isolation
# for development
pip install -e ".[dev]" --no-build-isolationNote that since our library builds several custom CUDA kernels, it can take time for this installation to finish.
Along with causal-conv1d the full installation can take about 30 minutes, depending on the number of CPUs available.
Our library provides implementations of the following Linear RNN architectures:
- S4
- S4D
- S5
- Event-SSM (inside
S5, use by passingintegration_timesteps) - LRU
- S6 (we implemented other discretizations)
- STREAM (inside
S6, use by passingintegration_timesteps) - RG-LRU
- S7
- aTENNuate
We expose several levels of API for each model, including a scan, a recurrent step, and a full layer API matching the paper. For S5 we implement both a convolution based approach and a parallel scan approach. The latter is more stable and faster for most use cases, but the convolution based approach can be faster for very long sequences.
It is easy to instantiate a model from our library
from lrnnx.models.lti import LRU
from lrnnx.models.ltv import Mamba
model_lti = LRU(d_model, d_state).cuda()
x = torch.randn(
batch_size, seq_len, d_model, dtype=torch.float32, device="cuda"
)
output = model_lti(x)
model_ltv = Mamba(d_model, d_state).cuda()
x = torch.randn(
batch_size, seq_len, d_model, dtype=torch.float32, device="cuda"
)
output = model_ltv(x)Linear RNNs in torch require special handling during inference, following Mamba, we also implement CUDA graphs based inference which reduces CPU overheads, this leads to > 10x speedup compared to using a simple for loop over the sequence length. The main file is generation.py which provides a simple API for autoregressive generation with any of the models in our library. You can see a simple way to use it in our benchmarking script.
This script will run both training and inference benchmarks.
python -m benchmarks.run_allWe also implement some common architectures based on the models in our library, such as a U-Net (inspired from aTENNuate ) and a hierarchical classifier (inspired from Event-SSM). Additionally, there is a Language Model architecture inspired from Mamba and RG-LRU which can be used for language modeling tasks, with replaceable LRNN and attention layers. This can be used as
from lrnnx.models.language_model import LRNNLMHeadModel
model = LRNNLMHeadModel(
d_model, d_state, num_layers, vocab_size, mixer_types=["s5", "s6", "attn"]
)
input_ids = torch.randint(0, vocab_size, (batch_size, seq_len))
logits = model(input_ids)Based on the architectures, there are tutorials on how to use them for two very popular use cases:
Please check out our Contributing Guide for details on how to contribute to this project.
If you use lrnnx in your research, please cite:
@inproceedings{bania-etal-2026-lrnnx,
title = "lrnnx: A library for Linear {RNN}s",
author = "Bania, Karan and
Kalburgi, Soham and
Tanwar, Manit and
Dhruthi and
Nagarsekar, Aditya and
Mestha, Harshvardhan and
Chibber, Naman and
Deshmukh, Raj and
Sathyanarayanan, Anish and
Rathore, Aarush and
Chheda, Pratham",
editor = "Baez Santamaria, Selene and
Somayajula, Sai Ashish and
Yamaguchi, Atsuki",
booktitle = "Proceedings of the 19th Conference of the {E}uropean Chapter of the {A}ssociation for {C}omputational {L}inguistics (Volume 4: Student Research Workshop)",
month = mar,
year = "2026",
address = "Rabat, Morocco",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2026.eacl-srw.60/",
doi = "10.18653/v1/2026.eacl-srw.60",
pages = "811--817",
ISBN = "979-8-89176-383-8",
abstract = "Linear recurrent neural networks (LRNNs) provide a structured approach to sequence modeling that bridges classical linear dynamical systems and modern deep learning, offering both expressive power and theoretical guarantees on stability and trainability. In recent years, multiple LRNN-based architectures have been proposed, each introducing distinct parameterizations, discretization schemes, and implementation constraints. However, existing implementations are fragmented across different software frameworks, often rely on framework-specific optimizations, and in some cases require custom CUDA kernels or lack publicly available code altogether. As a result, using, comparing, or extending LRNNs requires substantial implementation effort. To address this, we introduce $\texttt{lrnnx}$, a unified software library that implements several modern LRNN architectures under a common interface. The library exposes multiple levels of control, allowing users to work directly with core components or higher-level model abstractions. $\texttt{lrnnx}$ aims to improve accessibility, reproducibility, and extensibility of LRNN research and applications. We make our code available under a permissive MIT license."
}MIT
This library builds upon the excellent work of researchers who developed the individual LRNN models. Please see individual model documentation for proper citations of the original papers.