-
Notifications
You must be signed in to change notification settings - Fork 31
Open
Description
Hi, thanks for your great work. From Appendix F (Transformer Details) in the paper, the model backbone is a Transformer Encoder. However, from the source code, the backbone is the Hawk model (from the paper "Griffin: Mixing gated linear recurrences with local attention for efficient language models."), which partly consists of purely recurrent blocks. Is the performance MOTOR derived from this source code (i.e., Hawk model)?
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels