Skip to content

Plans for fp8 tuning going forward? Eg Deepseek v3 #2216

Open
@RonanKMcGovern

Description

As foundation models move towards being trained in eight bits, is there a plan in the roadmap to begin to support this type of approach?

Related to deepseek v3, are there plans to support mixture of expert architectures? I could fully understand if this is too far away from a coherent roadmap.

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions