This repository contains code and notebooks used in experiments and to make plots for the paper:
Kuratov, Y., Arkhipov, M., Bulatov, A., Burtsev, M., "Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity", ACL 2025.
This work was done in collaboration of AIRI, DeepPavlov.ai, and London Institute for Mathematical Sciences.
Our experiments show that a single Llama-3.1-8B input vector can compress and subsequently decode text sequences of over 1,500 tokens. Moreover, this capacity increases nearly linearly when multiple vectors are used.
Left: Compressing text into a [mem] vector. The pre-trained LLM is frozen, and we only finetune one or multiple [mem] vectors to decode the sequence of tokens [mem] vectors are trained for each text separately.
Right: How many tokens fit into a single input vector? We estimate maximum number of tokens that can be decoded from a single input vector across various language models
- 5 Jun 2025: Released v2, the camera-ready version of the paper accepted to ACL 2025 (main track). Added results for Mamba (130m, 370m, 790m, 1.4b) models and added the discussion of how our work relates to entropy coders
- 15 May 2025: Our paper was accepted to ACL 2025 (main track)!
- 18 Feb 2025: Released the arXiv preprint v1
train.py- implements training loop for text compression into a vector.model.py- implementation of wrapper that adds trainable input vectors referred as[mem]to any model from HF, it is based on Recurrent Memory Transformer (RMT) implementation.
scripts/run.*.sh- bash scripts for different models, they include running experiments on PG-19, fanfics, and random texts with single or multiple trainable input[mem]vectors.
notebooks/- Folder with notebooks used for visualizations and collecting results.notebooks/ablation_analyze_results.ipynb- Table 1, Figure 3, Figure 6. Analysis of compression, capacity in tokens, capacity in terms of entropy.notebooks/plot_length_model_brief.ipynb- Figure 1, text compression results on PG-19.notebooks/plot_length_vs_n_mem_tokens.ipynb- Figure 4, scaling compression and number of trainable[mem]vectors.notebooks/plot_model_theor_capacity_vs_actual.ipynb- Figure 5, Theoretical capacity vs empirical.- notebooks with
add_mambasuffix add results for Mamba (130m, 370m, 790m, 1.4b) models.
To quickly get started, you can download our preprocessed text chunks for PG-19 and fanfics with a single command:
cd ./data
./download_texts.shThis script will fetch the required texts and place them in the ./data folder.
If you would like to preprocess the text chunks yourself or modify the process:
- PG-19: The preprocess_pg19.ipynb notebook shows how we build text chunks from the original PG-19 corpus.
- Fanfics: The preprocess_fanfics.ipynb notebook shows how we cleaned and preprocessed HTML fanfic data. The list of the fanfic URLs is in fanfics_urls.txt.
- Random Texts: We generate random texts from the GloVe vocabulary. The script make_vocab.py extracts the top 100k words from
glove.6B.50d.txt:
python make_vocab.py --glove_path ./glove.6B/glove.6B.50d.txt --vocab_size 100000 --output_path ./data/vocab_100k.txt@misc{kuratov2025cramming,
title={Cramming 1568 Tokens into a Single Vector and Back Again: Exploring the Limits of Embedding Space Capacity},
author={Yuri Kuratov and Mikhail Arkhipov and Aydar Bulatov and Mikhail Burtsev},
year={2025},
eprint={2502.13063},
archivePrefix={arXiv},
primaryClass={cs.CL}
}

