Context Parallelism

Is Context Parallelism supported in Nanotron? It seems that most of the frameworks are supporting Context Parallelism for LongContext training & it is making almost defacto standard (Llama paper, Nvidia Megatron etc.) I understand that SmolLM team has been creating LongContext training reciepies with nontron, however there seems to be no observations on how CP is not a limiting factor in terms of throughput?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Context Parallelism #383

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Context Parallelism #383

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions