[FSDP] Support Context parallelism for FSDP using ring-flash-attn by PopSoda2002 · Pull Request #467 · THUDM/slime

PopSoda2002 · 2025-10-12T03:14:59Z

Try to solve #294 using ring-flash-attn with datapacking

How to use

pip install ring-flash-attn

Result

Compare to main branch

Almost match with main branch
script changed:

# Context Parallelism Arguments
# Context Parallelism enables training with longer sequences by splitting sequences across GPUs
# This uses ring-flash-attention library for efficient attention computation
CP_ARGS=(
   --enable-cp                        # Enable Context Parallelism
   --ring-flash-atten-type llama3     # Use llama3 ring attention implementation (recommended for varlen)
   --context-parallel-size 2
)

zhaochenyang20 · 2025-11-10T20:41:49Z

Really great process and hope you learned a lot from this process. Shall we post a blog on the journey of CP in awesome-ml-sys? Also, your job opportunity shall always be the first. Really glad to see your resolution and great improvement. Hope for the best.

zhaochenyang20 · 2025-11-13T18:49:24Z

🐂🍺

zhuzilin · 2025-11-15T11:29:41Z

slime/backends/fsdp_utils/actor.py

+        world_size = dist.get_world_size()
+        rank = dist.get_rank()
+
+        if self.args.enable_cp:


we can use the self.args.context_parallel_size directly. And we don't need to separate the mesh init for cp size > 1.

fixed,thanks

zhuzilin · 2025-11-15T11:31:46Z

slime/backends/fsdp_utils/actor.py

-                    )
+                        logits = self.model(**model_args).logits.squeeze(0)
+                        if self.args.enable_cp:
+                            log_probs_result, entropy_result = get_chunked_logp_and_entropy(


please merge the with and without cp implemtation into one.

zhuzilin · 2025-11-15T11:32:22Z

slime/backends/fsdp_utils/actor.py

        rank = dist.get_rank()

-        rollout_data = process_rollout_data(self.args, rollout_data_ref, rank, world_size)
+        dp_rank = self.dp_rank if self.args.enable_cp else rank


we can always use dp_rank.

zhuzilin · 2025-11-15T11:32:50Z

slime/backends/fsdp_utils/actor.py

+            ).logits.squeeze(0)
+
+            # Gather logits from all CP ranks if CP is enabled (with gradient support)
+            if self.args.enable_cp:


similar as the comment above, please merge the 2 branch.

zhuzilin · 2025-11-15T11:34:23Z

slime/backends/fsdp_utils/data_packing.py

+        if tokens[idx].item() == 0:
+            pad_length += 1
+        else:
+            break


I think that we can re-calculate the pad length instead of a for loop

PopSoda2002 · 2025-11-16T07:19:01Z

New result also matches between cp 1 and cp 2:

zhuzilin · 2025-11-16T08:05:32Z

Thank you so much for this!

zhaochenyang20 · 2025-11-16T22:54:40Z

牛逼！

…UDM#467) Co-authored-by: Hecate0821 <hec4te0821@gmail.com>

Williamren97 force-pushed the feat/support_normal_cp branch 2 times, most recently from a8ad2ec to 4948643 Compare October 12, 2025 13:41

PopSoda2002 force-pushed the feat/support_normal_cp branch from 88afbd4 to f32b57f Compare November 1, 2025 22:21

PopSoda2002 closed this Nov 2, 2025

PopSoda2002 force-pushed the feat/support_normal_cp branch from 653578b to 6d01709 Compare November 2, 2025 09:14

PopSoda2002 reopened this Nov 2, 2025

PopSoda2002 marked this pull request as ready for review November 2, 2025 09:23

PopSoda2002 force-pushed the feat/support_normal_cp branch from 35c2578 to 346abce Compare November 5, 2025 05:44

PopSoda2002 and others added 13 commits November 15, 2025 02:22

Support CP

e2edf57

Fix

7ef0254

Ensure cp group has same data

7b25391

Use device mesh for fsdp for proper gradient update and improve logs

b798845

Clean code

76951dd

Change device mesh for dp

a551e3b

Fix training/infer log probs mismatch

46736a4

Refactor device mesh setup

ceaeedb

Update sript

9b0c4a4

All gather logp and entropy instead of logits

42dd529

Update script and eliminate debugging logs

6c7806a

clean code

ecab4a1

clean

a41f7c1

PopSoda2002 force-pushed the feat/support_normal_cp branch from a964ed1 to a41f7c1 Compare November 15, 2025 02:25

PopSoda2002 added 2 commits November 15, 2025 02:31

Clean

da20f6f

Clean code

8113ce0

zhuzilin reviewed Nov 15, 2025

View reviewed changes

PopSoda2002 added 2 commits November 16, 2025 07:15

Refactor

0cd6136

clean

53e9490

clean

faee401

zhuzilin merged commit 6d3b33f into THUDM:main Nov 16, 2025

guapisolo mentioned this pull request Nov 19, 2025

True on policy doesn't work on the latest docker image + main(695a68c) #830

Closed

llltttwww pushed a commit to llltttwww/slime that referenced this pull request Nov 30, 2025

[FSDP] Support Context parallelism for FSDP using ring-flash-attn (TH…

3de8800

…UDM#467) Co-authored-by: Hecate0821 <hec4te0821@gmail.com>

Yangruipis pushed a commit to rednote-ai/slime that referenced this pull request Feb 28, 2026

[FSDP] Support Context parallelism for FSDP using ring-flash-attn (TH…

efa0494

…UDM#467) Co-authored-by: Hecate0821 <hec4te0821@gmail.com>

Conversation

PopSoda2002 commented Oct 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

How to use

Result

Uh oh!

zhaochenyang20 commented Nov 10, 2025

Uh oh!

zhaochenyang20 commented Nov 13, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

PopSoda2002 commented Nov 16, 2025

Uh oh!

zhuzilin commented Nov 16, 2025

Uh oh!

zhaochenyang20 commented Nov 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

PopSoda2002 commented Oct 12, 2025 •

edited

Loading