Skip to content

Commit 66c8335

Browse files
committed
Add issue for Multi-Node NVLink
Signed-off-by: Andrey Velichkevich <andrey.velichkevich@gmail.com>
1 parent da0f3d5 commit 66c8335

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

ROADMAP.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,14 +5,14 @@
55
- Scheduling & Scalability
66
- Workload-Aware Scheduling for TrainJobs: https://github.com/kubeflow/trainer/issues/3015
77
- KAI Scheduler Integrations: https://github.com/kubeflow/trainer/issues/2628
8-
- Enhanced Multi-Node NVLink Support
8+
- Support Multi-Node NVLink (MNNVL) for TrainJob: https://github.com/kubeflow/trainer/issues/3264
99
- First-Class Integration with [Kueue](https://kueue.sigs.k8s.io/docs/tasks/run/trainjobs/) for
1010
multi-cluster job dispatching, topology-aware scheduling, and other features.
1111
- Enhanced Scalability for Massively Distributed TrainJobs: https://github.com/kubeflow/trainer/issues/2318
1212
- MPI and HPC on Kubernetes
1313
- Flux Integration for MPI and HPC workloads: https://github.com/kubeflow/trainer/issues/2841
1414
- IntelMPI Support: https://github.com/kubeflow/trainer/issues/1807
15-
- PMIx Investigation with Flux or Slurm: https://github.com/kubeflow/mpi-operator/issues/12
15+
- PMIx Investigation with Flux or Slurm plugins
1616
- Enhance MPI Orchestration: https://github.com/kubeflow/trainer/issues/2751
1717
- Observability & Reliability
1818
- TrainJob Progress Tracking & Metrics Exposure: https://github.com/kubeflow/trainer/issues/2779

0 commit comments

Comments
 (0)