-
Notifications
You must be signed in to change notification settings - Fork 5
Open
Description
This roadmap is the H2 2025 development plan of Primus-Turbo.
Note: The roadmap is flexible and will be updated over time based on project needs and community input.
Release Overview
| Version | Framework | Status | Date |
|---|---|---|---|
| v0.1.0 | PyTorch + ROCm6.4 | ✅ Released | 2025-09-11 |
| v0.1.1 | PyTorch + ROCm7.0 | ✅ Released | 2025-10-15 |
| v0.2.0 | PyTorch + ROCm7.x | 🚧 In Progress | 2025-11 (est.) |
| v0.3.0 | Planning | Planning | TBD |
| v0.4.0 | Planning | Planning | TBD |
Detailed Plans
v0.1.0 (Released)
Focus
- Build the foundational framework of Primus-Turbo.
- Provide core operators.
Features
- GEMM: Support FP16/BF16.
- FlashAttention: Support FP16/BF16.
- GroupedGEMM: Support FP16/BF16.
Famework
- Provide PyTorch APIs
- Support ROCm 6.4
v0.2.0 (In Progress)
Focus
- Introduce FP8 foundational support.
- Enable communication primitives with FP8, focusing on DeepEP.
Features
- GEMM: Support FP8 (E4M3/E5M2).
- Support Tensorwise.
- Support Rowwise.
- Support Blockwise.
- Support MX
- FlashAttention: Support FP8 (E4M3/E5M2).
- Support Blockwise.
- GroupedGEMM: Support FP8 (E4M3/E5M2).
- Support Tensorwise.
- Support Rowwise.
- Support Blockwise.
- Support MX
- All2All: FP8 support.
- Support Tensorwise.
- DeepEP:
- Intra-Node Normal Kernel.
- Inter-Node Normal Kernel.
- Support NICs.
- ConnectX-7
- Thor2
- Pensando
- Support
internode_dispatchGPU-CPU no sync. - Support
torch.compile
- TokenDispatcher:
- Integrate Permute/Unpermute
- Support Sync-Free
DeepEPTokenDispatcher - Support MoE Fused Activations.
v0.3.0 (Planning)
Focus
- Performance optimization for FP16/BF16 and FP8.
Features
...
v0.4.0 (Planning)
Focus
- Explore ultra-low precision training and inference (FP4 / FP6).
Features
...
Metadata
Metadata
Labels
No labels