Question about Preventing Outlier Tokens during Inference

Thank you for your very interesting work. I would like to ask about something mentioned in your paper: "Considering the auto-regressive inference pipeline of LLMs, we store these prefix tokens in the KV cache to prevent generating new outlier tokens during inference." I don't understand why storing outlier tokens in the prefix cache prevents the generation of new outlier tokens during inference. Could you please explain this further?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about Preventing Outlier Tokens during Inference #21

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Question about Preventing Outlier Tokens during Inference #21

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions