Describe the feature
So as the titles says, I was very excited to train big foundation MoE embedding model using ms-swift framework, as it has both infonce loss support and expert parallel megatron support. Only later i found out that those two things are not compatible. Is there any chance you can add support for task_type = embeddings and infonce loss for megatron?