Note
Our latest evaluation code is maintained in /dev
branch. Please check it out for the latest updates. This README is outdated.
This page is subject to change as our paper enters the review process.
This is the official repository for the Top-nσ sampling algorithm. The repository aims to provide a working implementation of the algorithm and collect empirical data with help from the community. We encourage you to try it out and share your feedback!
TLDR:
M, sigma=logits.max(keep_dim=True, dim=-1), logits.std(keepdim=True, dim=-1)
logits[logits < M-n*sigma] = float('-inf')
Top-nσ is a novel sampling method for language models that truncates the probability distribution based on standard deviations from the maximum logit value. It exhibits superior performance in terms of quality and diversity compared to existing sampling methods, particularly in high temperature settings. Basically, you don't need to worry about temperature wielding top-nsigma.
We provide two versions of the implementation:
- HuggingFace version, see in
src/hf/hf_nsigma.py
. You can use it out of the box with HuggingFace Transformers. - VLLM version, see in
src/vllm/sampler.py
. You need to apply the ugly hack insrc/vllm/hack.py
. Put it simply, here is how you do it:
import vllm
from hack import hack_vllm, recover_sampler
from sampler import FacadeSampler
model = vllm.LLM(model=path)
hack_vllm(model, FacadeSampler(nsigma, device))
Note
Due to vLLM's current architecture, we have to use this temporary hack.
We strongly welcome contributions from the community!
A key question is: what's the best value for
aphrodite-engine/aphrodite-engine#825
If you find this work useful, please consider citing:
@misc{tang2024topnsigmalogitsneed,
title={Top-$n\sigma$: Not All Logits Are You Need},
author={Chenxia Tang and Jianchun Liu and Hongli Xu and Liusheng Huang},
year={2024},
eprint={2411.07641},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2411.07641},
}