Inconsistent `sink_size` during training vs. inference

Thanks so much for the great and insightful work! I have a small question for the arg `sink_size` of model `Llama-3-8B-Instruct-Gradient-1048k`.

- During training, it was set to "sink_size": 128 in this config [Llama-3-8B-Instruct-Gradient-1048k](https://github.com/mit-han-lab/duo-attention/tree/main/attn_patterns/Llama-3-8B-Instruct-Gradient-1048k)/[lr=0.02-reg=0.05-ctx=1000_32000-multi_passkey10](https://github.com/mit-han-lab/duo-attention/tree/main/attn_patterns/Llama-3-8B-Instruct-Gradient-1048k/lr%3D0.02-reg%3D0.05-ctx%3D1000_32000-multi_passkey10). 

- During inference, it is set to sink_size=64, per https://github.com/mit-han-lab/duo-attention/blob/fe93c314ae87306ef6629dc16713250b4718ffe7/README.md?plain=1#L145 

I was just wondering if this inconsistency is on purpose or a typo. Any comments or suggestions in practice would be greatly appreciated. Thank you again for the well-documented and great code!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Inconsistent `sink_size` during training vs. inference #22

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Inconsistent sink_size during training vs. inference #22

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Inconsistent `sink_size` during training vs. inference #22