It is supported by Tinker (https://tinker-docs.thinkingmachines.ai/model-lineup) and also the basis for many popular models like obviously DeepSeek itself, but also e.g. GLM 4.7 https://github.com/huggingface/transformers/blob/7f20ad0073ae6d2f6a799c1404448d579496b6c4/src/transformers/models/glm4_moe/modular_glm4_moe.py#L280