Skip to content

Question about Preventing Outlier Tokens during Inference #21

Open
@fingerk28

Description

@fingerk28

Thank you for your very interesting work. I would like to ask about something mentioned in your paper: "Considering the auto-regressive inference pipeline of LLMs, we store these prefix tokens in the KV cache to prevent generating new outlier tokens during inference." I don't understand why storing outlier tokens in the prefix cache prevents the generation of new outlier tokens during inference. Could you please explain this further?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions