You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Store fused MoE wi weight as (G,K,2N) when fused_mlp=True
When fused_mlp is enabled, initialize self.wi as a single (G,K,2N)
parameter instead of two separate wi_0/wi_1 (G,K,N) tensors. This
loads expert weights from HBM once per forward pass; the concat in
sparse_matmul becomes a view of adjacent slices that XLA elides.
0 commit comments