FlashSampling

FlashSampling: Fast and Memory-Efficient Exact Sampling

We present FlashSampling, an exact sampling primitive that fuses sampling into the LM-head matmul and never materializes the logits tensor in HBM. The method is simple: compute logits tile-by-tile on chip, add Gumbel noise, keep only one maximizer per row and per vocabulary tile, and finish with a small reduction over tiles. FlashSampling enables efficient categorical sampling by fusing the operation into the language model head matmul, eliminating memory overhead and reducing decoding time by up to 19%.

Author: Tomas Ruiz*, Zhen Qin*, Yifan Zhang†, Xuyang Shen, Yiran Zhong, Mengdi Wang†

Date: February 28, 2026

[Project Page] [Webpage] [Huggingface]

Citation

@article{ruiz2026flashsampling,
  title={FlashSampling: Fast and Memory-Efficient Exact Sampling},
  author = {Ruiz, Tomas and Qin, Zhen and Zhang, Yifan and Shen, Xuyang and Zhong, Yiran and Wang, Mengdi},
  journal={arXiv preprint arXiv:2603.15854},
  year={2026}
}

📜 License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 383 Commits
.vscode		.vscode
benchmarking		benchmarking
docs		docs
examples		examples
findings		findings
imgs		imgs
scripts		scripts
src/fused_mm_sampling		src/fused_mm_sampling
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
AGENTS.md		AGENTS.md
CLAUDE.md		CLAUDE.md
FlashSampling.pdf		FlashSampling.pdf
FlashSampling.png		FlashSampling.png
LICENSE		LICENSE
Makefile		Makefile
README-bak.md		README-bak.md
README.md		README.md
REPRODUCTION.md		REPRODUCTION.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock
ws.code-workspace		ws.code-workspace

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlashSampling

FlashSampling: Fast and Memory-Efficient Exact Sampling

Citation

📜 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FlashSampling

FlashSampling: Fast and Memory-Efficient Exact Sampling

Citation

📜 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages