Inconsistent initialization of RoPE embedding across component builders 

### Context

The [Llama 3.1 self-attention builder](https://github.com/pytorch/torchtune/blob/779569e49c6a7c9988d0109841180b6e69ea2027/torchtune/models/llama3_1/_component_builders.py#L277) takes RoPE embeddings as an argument, allowing us to build RoPE a single time across all layers. However, the corresponding components for [Llama2](https://github.com/pytorch/torchtune/blob/779569e49c6a7c9988d0109841180b6e69ea2027/torchtune/models/llama2/_component_builders.py#L412) and [Llama3](https://github.com/pytorch/torchtune/blob/779569e49c6a7c9988d0109841180b6e69ea2027/torchtune/models/llama3/_component_builders.py#L413-L415) do not do this -- they instead construct RoPE for every single layer. 

Whether we should use a single global RoPE vs one RoPE per layer? We should standardize this.

_Originally posted by @ebsmothers in https://github.com/pytorch/torchtune/pull/2282#discussion_r1922771723_
            

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistent initialization of RoPE embedding across component builders #2283

Context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development