Skip to content

Have you tried Rope_base = 500k in YaRN? #64

@hannlp

Description

@hannlp

In the YaRN paper, rope_base=10000 (static YaRN) was used, yielding excellent extrapolation results. Could the authors clarify whether setting rope_base to 500000 while using YaRN would produce a synergistic effect, i.e., achieving results that surpass both YaRN (rope_base=10000) and NTK-aware (rope_base=500000)? @jquesnelle

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions