Skip to content

[Roadmap] Primus-Turbo Roadmap H2 2025 #101

@xiaobochen-amd

Description

@xiaobochen-amd

This roadmap is the H2 2025 development plan of Primus-Turbo.

Note: The roadmap is flexible and will be updated over time based on project needs and community input.

Release Overview

Version Framework Status Date
v0.1.0 PyTorch + ROCm6.4 ✅ Released 2025-09-11
v0.1.1 PyTorch + ROCm7.0 ✅ Released 2025-10-15
v0.2.0 PyTorch + ROCm7.x 🚧 In Progress 2025-11 (est.)
v0.3.0 Planning Planning TBD
v0.4.0 Planning Planning TBD

Detailed Plans

v0.1.0 (Released)

Focus

  • Build the foundational framework of Primus-Turbo.
  • Provide core operators.

Features

  • GEMM: Support FP16/BF16.
  • FlashAttention: Support FP16/BF16.
  • GroupedGEMM: Support FP16/BF16.

Famework

  • Provide PyTorch APIs
  • Support ROCm 6.4

v0.2.0 (In Progress)

Focus

  • Introduce FP8 foundational support.
  • Enable communication primitives with FP8, focusing on DeepEP.

Features

  • GEMM: Support FP8 (E4M3/E5M2).
    • Support Tensorwise.
    • Support Rowwise.
    • Support Blockwise.
    • Support MX
  • FlashAttention: Support FP8 (E4M3/E5M2).
    • Support Blockwise.
  • GroupedGEMM: Support FP8 (E4M3/E5M2).
    • Support Tensorwise.
    • Support Rowwise.
    • Support Blockwise.
    • Support MX
  • All2All: FP8 support.
    • Support Tensorwise.
  • DeepEP:
    • Intra-Node Normal Kernel.
    • Inter-Node Normal Kernel.
    • Support NICs.
      • ConnectX-7
      • Thor2
      • Pensando
    • Support internode_dispatch GPU-CPU no sync.
    • Support torch.compile
  • TokenDispatcher:
    • Integrate Permute/Unpermute
    • Support Sync-Free DeepEPTokenDispatcher
    • Support MoE Fused Activations.

v0.3.0 (Planning)

Focus

  • Performance optimization for FP16/BF16 and FP8.

Features

...

v0.4.0 (Planning)

Focus

  • Explore ultra-low precision training and inference (FP4 / FP6).

Features

...

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions