Skip to content

[RFC]: Integrate USP (Ulysses + Ring Attention) for Context Parallelism in SpecForge #299

@uygnef

Description

@uygnef

1. Motivation

Training 16k-length sequences currently causes OOM errors #112. To support 100k+ sequences, we need efficient context parallelism (CP). Per https://arxiv.org/abs/2405.07719, USP (Ulysses + Ring Attention) outperforms standalone approaches, making it our top choice.

2. Proposal

Integrate USP into SpecForge. This hybrid approach combines:
Ulysses: Offers better performance
Ring Attention: Enables support for longer sequence lengths

3. Expected Benefits

Enable 100k+ sequence training without OOM
Maintain computational efficiency
Preserve model accuracy at scale

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions