Open
Description
Thank you for your very interesting work. I would like to ask about something mentioned in your paper: "Considering the auto-regressive inference pipeline of LLMs, we store these prefix tokens in the KV cache to prevent generating new outlier tokens during inference." I don't understand why storing outlier tokens in the prefix cache prevents the generation of new outlier tokens during inference. Could you please explain this further?
Metadata
Metadata
Assignees
Labels
No labels