This repository contains the official implementation for the paper "Parameter-Efficient Transformer Embedding" by Henry Ndubuaku and Mouad Talhi.
Traditional embedding layers in Transformer models often constitute the largest portion of parameters, scaling with vocabulary size without a proportional increase in performance. This project introduces PETE, a novel approach where token embeddings are generated deterministically using polynomial basis functions (Fourier, Chebyshev, Legendre, Laguerre, Hermite) applied to normalized token IDs, followed by a lightweight MLP.
This method significantly reduces the parameter count compared to standard learned embeddings, leading to faster training times and competitive performance, especially on sentence similarity tasks. The core polynomial expansions are implemented using efficient custom C++/CUDA kernels.
- Parameter Efficiency: Replaces large learned embedding tables with deterministic polynomial expansions and a small MLP, drastically reducing parameters.
- Multiple Polynomial Bases: Supports Fourier (default), Chebyshev, Legendre, Laguerre, and Hermite expansions. (Note: Currently, the code seems hardcoded to Fourier in
src/pete.py
, but the kernels exist). - Custom Kernels: High-performance C++/CUDA kernels for polynomial basis calculations.
- Competitive Performance: Achieves strong results on benchmarks like STS-B, outperforming comparable small models.
- Faster Training: Reduced parameter count and efficient kernels lead to quicker training cycles.
.
├── polynomial_embeddings/ # C++/CUDA kernels for polynomial expansions
│ ├── *.cpp
│ ├── *.cu
│ └── *.h
├── src/ # Python source code for model, training, data handling
│ ├── pete.py # Main PETE model definition
│ ├── trainer.py # Training and evaluation loops
│ ├── embedder.py # Embedding wrapper and utility functions
│ ├── benchmark.py # Evaluation functions
│ ├── data.py # Data loading and processing
│ └── ...
├── paper/ # LaTeX source for the paper
│ └── main.tex
├── environment.yml # Conda environment specification
├── setup.py # Setup for polynomial_embeddings package
└── README.md # This file
- A Linux environment (tested on Ubuntu).
- NVIDIA GPU with CUDA support.
nvcc
(NVIDIA CUDA Compiler): Verify withnvcc --version
. Install via package manager (e.g.,sudo apt install nvidia-cuda-toolkit
) or Conda.g++
: Verify withwhich g++
. Install withsudo apt install build-essential
.- Conda or Miniconda.
-
Clone the repository:
git clone https://github.com/HMUNACHI/pete.git cd pete
-
Create and activate the Conda environment:
conda env create -f environment.yml conda activate pete_env
-
Install CUDA toolkit within the environment (adjust version if needed):
conda install -c nvidia/label/cuda-11.7.0 cuda-toolkit=11.7 cuda-nvcc=11.7 # Or for newer CUDA versions: # conda install cuda -c nvidia
-
Compile and install the custom polynomial embedding kernels:
cd polynomial_embeddings pip install . cd ..
To train a model using the default configuration (Fourier embeddings):
python src/trainer.py --experiment_name=pete_fourier_default
Check src/trainer.py
(or potentially a separate script if arguments are added later) for command-line arguments to customize:
- Model dimensions (
d_model
) - Number of layers (
num_hidden_layers
) - Number of attention heads (
num_attention_heads
) - Epochs, learning rate, batch size, etc.
- Choice of polynomial embedding (Requires code modification in
src/pete.py
PolynomialBlock
currently).
The training script automatically runs evaluations on validation sets (like STS-B) during and after training. Results are logged to TensorBoard (runs/
) and printed to the console. The best model weights are saved in the weights/
directory.
Currently, the PolynomialBlock
in src/pete.py
is hardcoded to use polynomial_embeddings.fourier
. To use other bases (Chebyshev, Legendre, Laguerre, Hermite), you would need to modify this line:
# In src/pete.py -> PolynomialBlock.forward
# Change this line to use a different kernel:
embeddings = polynomial_embeddings.fourier( # Change 'fourier' to 'chebyshev', 'legendre', etc.
input_ids, self.max_seq_len, self.d_model
)
If you find this work useful in your research, please cite our paper:
@article{ndubuaku2024pete,
title={Parameter-Efficient Transformer Embedding},
author={Ndubuaku, Henry and Talhi, Mouad},
journal={arXiv preprint arXiv:2505.02266},
year={2025}
}
(Please update the BibTeX entry and arXiv link/badge when the paper is available.)
This project is licensed under the MIT License - see the LICENSE file for details (assuming MIT, add a LICENSE file if one doesn't exist).