Skip to content

[grug] Multi-host MoE training fails with ShardingTypeError on shared-expert residual add #4309

[grug] Multi-host MoE training fails with ShardingTypeError on shared-expert residual add

[grug] Multi-host MoE training fails with ShardingTypeError on shared-expert residual add #4309

Job log options

This job was skipped