Plans for fp8 tuning going forward? Eg Deepseek v3

As foundation models move towards being trained in eight bits, is there a plan in the roadmap to begin to support this type of approach?

Related to deepseek v3, are there plans to support mixture of expert architectures? I could fully understand if this is too far away from a coherent roadmap.