[Roadmap] Primus-Turbo Roadmap H2 2025

This roadmap is the H2 2025 development plan of Primus-Turbo.

**Note:** The roadmap is flexible and will be updated over time based on project needs and community input.

# Release Overview
| Version     | Framework                         | Status                  | Date                           |
|------------|------------------------------|--------------------|------------------------|
| **v0.1.0** | PyTorch + ROCm6.4           | ✅ Released        | 2025-09-11               |
| **v0.1.1** | PyTorch + ROCm7.0           | ✅ Released        | 2025-10-15               |
| **v0.2.0** | PyTorch + ROCm7.x           | 🚧 In Progress    | 2025-11 (est.)            |
| **v0.3.0** | Planning           | Planning              | TBD                           |
| **v0.4.0** | Planning           | Planning              | TBD                           |


# Detailed Plans
## v0.1.0 (Released)
### Focus
- Build the foundational framework of Primus-Turbo.
- Provide core operators.
### Features
- [x]  GEMM: Support FP16/BF16.
- [x]  FlashAttention:  Support FP16/BF16.
- [x]  GroupedGEMM: Support FP16/BF16. 
### Famework
- [x] Provide PyTorch APIs
- [x] Support ROCm 6.4 

## v0.2.0 (In Progress)
### Focus
- Introduce FP8 foundational support.
- Enable communication primitives with FP8, focusing on DeepEP.

### Features
- [ ] GEMM: Support FP8 (E4M3/E5M2).
  - [x] Support Tensorwise.
  - [x] Support Rowwise.
  - [ ] Support Blockwise.
  - [ ] Support MX
- [ ] FlashAttention: Support FP8 (E4M3/E5M2).
  - [ ] Support Blockwise. 
- [ ] GroupedGEMM: Support FP8 (E4M3/E5M2).
  - [x] Support Tensorwise.
  - [x] Support Rowwise.
  - [ ] Support Blockwise.
  - [ ] Support MX
- [ ] All2All: FP8 support.
  - [x] Support Tensorwise.
- [ ] DeepEP:
  - [x] Intra-Node Normal Kernel.
  - [x] Inter-Node Normal Kernel.
  - [x] Support NICs.
      - [x] ConnectX-7 
      - [x] Thor2
      - [x] Pensando
  - [x] Support `internode_dispatch` GPU-CPU no sync.
  - [ ] Support `torch.compile`
- [ ] TokenDispatcher:
    - [x] Integrate Permute/Unpermute
    - [x] Support Sync-Free `DeepEPTokenDispatcher`
    - [x] Support MoE Fused Activations.

## v0.3.0 (Planning)
### Focus
- Performance optimization for FP16/BF16 and FP8.

### Features
...

## v0.4.0 (Planning)
### Focus
- Explore ultra-low precision training and inference (FP4 / FP6).

### Features
...


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Roadmap] Primus-Turbo Roadmap H2 2025 #101

Release Overview

Detailed Plans

v0.1.0 (Released)

Focus

Features

Famework

v0.2.0 (In Progress)

Focus

Features

v0.3.0 (Planning)

Focus

Features

v0.4.0 (Planning)

Focus

Features

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Version	Framework	Status	Date
v0.1.0	PyTorch + ROCm6.4	✅ Released	2025-09-11
v0.1.1	PyTorch + ROCm7.0	✅ Released	2025-10-15
v0.2.0	PyTorch + ROCm7.x	🚧 In Progress	2025-11 (est.)
v0.3.0	Planning	Planning	TBD
v0.4.0	Planning	Planning	TBD

[Roadmap] Primus-Turbo Roadmap H2 2025 #101

Description

Release Overview

Detailed Plans

v0.1.0 (Released)

Focus

Features

Famework

v0.2.0 (In Progress)

Focus

Features

v0.3.0 (Planning)

Focus

Features

v0.4.0 (Planning)

Focus

Features

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions