Open
Description
Hi!
I have a couple questions about increasing context lengths during training.
- Does the framework support increasing the context length during training when using rotary position embeddings?
- If it does, does it support position embedding interpolation? Or just extrapolation?
Thanks!
Ingus