-
Notifications
You must be signed in to change notification settings - Fork 258
Open
Description
Is Context Parallelism supported in Nanotron? It seems that most of the frameworks are supporting Context Parallelism for LongContext training & it is making almost defacto standard (Llama paper, Nvidia Megatron etc.) I understand that SmolLM team has been creating LongContext training reciepies with nontron, however there seems to be no observations on how CP is not a limiting factor in terms of throughput?
Metadata
Metadata
Assignees
Labels
No labels