The backbone of the MOTOR

Hi, thanks for your great work. From Appendix F (Transformer Details) in the paper, the model backbone is a Transformer Encoder. However, from the source code, the backbone is the Hawk model (from the paper "Griffin: Mixing gated linear recurrences with local attention for efficient language models."), which partly consists of purely recurrent blocks. Is the performance MOTOR derived from this source code (i.e., Hawk model)?

 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The backbone of the MOTOR #255

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

The backbone of the MOTOR #255

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions