Skip to content

Inconsistent initialization of RoPE embedding across component builders  #2283

Open
@Ankur-singh

Description

Context

The Llama 3.1 self-attention builder takes RoPE embeddings as an argument, allowing us to build RoPE a single time across all layers. However, the corresponding components for Llama2 and Llama3 do not do this -- they instead construct RoPE for every single layer.

Whether we should use a single global RoPE vs one RoPE per layer? We should standardize this.

Originally posted by @ebsmothers in #2282 (comment)

Metadata

Assignees

No one assigned

    Labels

    best practiceThings we should be doing but aren'tbetter engineeringTasks which help improve eng productivity e.g. building tools, cleaning up code, writing docs

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions