Skip to content

Commit 5287063

Browse files
committed
support sharding stage3 for deepseekv3
1 parent 5122a75 commit 5287063

1 file changed

Lines changed: 2 additions & 0 deletions

File tree

paddlenlp/transformers/deepseek_v2/modeling.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -741,6 +741,8 @@ def forward(self, hidden_states):
741741
return scores, routing_map, l_aux, l_zloss
742742

743743
capacity, combine_weights, dispatch_mask, exp_counts, l_aux, l_zloss = self.topkgating(scores)
744+
dispatch_mask.stop_gradient = True
745+
exp_counts.stop_gradient = True
744746
return capacity, combine_weights, dispatch_mask, exp_counts, l_aux, l_zloss
745747

746748

0 commit comments

Comments
 (0)