-
Notifications
You must be signed in to change notification settings - Fork 36
Open
Description
Thanks so much for the great and insightful work! I have a small question for the arg sink_size of model Llama-3-8B-Instruct-Gradient-1048k.
-
During training, it was set to "sink_size": 128 in this config Llama-3-8B-Instruct-Gradient-1048k/lr=0.02-reg=0.05-ctx=1000_32000-multi_passkey10.
-
During inference, it is set to sink_size=64, per
Line 145 in fe93c31
sink_size=64,
I was just wondering if this inconsistency is on purpose or a typo. Any comments or suggestions in practice would be greatly appreciated. Thank you again for the well-documented and great code!
Metadata
Metadata
Assignees
Labels
No labels