opendatahub-io · ChughShilpa · Apr 23, 2026 · Apr 23, 2026
diff --git a/...arks/kftv2-mpi-ddp-sft-ddp/1.5b/README.md → benchmarks/kftv2-mpi-ddp-sft/README.md b/...arks/kftv2-mpi-ddp-sft-ddp/1.5b/README.md → benchmarks/kftv2-mpi-ddp-sft/README.md
@@ -1,6 +1,6 @@
-# MPI DDP SFT Benchmark — Qwen 2.5 1.5B
+# MPI DDP SFT Benchmark — Qwen 2.5-1.5B-Instruct
 
-Distributed Supervised Fine-Tuning benchmark using **PyTorch DDP** with **MPI** as the communications backend, submitted via Kubeflow Trainer v2 `TrainJob`.
+Distributed Supervised Fine-Tuning benchmark using **PyTorch DDP** with **MPI** as the communications backend, submitted via Kubeflow Trainer v2.
 
 ## What this benchmark does
 
@@ -9,10 +9,10 @@ Distributed Supervised Fine-Tuning benchmark using **PyTorch DDP** with **MPI**
 | Algorithm | SFT with PyTorch DistributedDataParallel (DDP) |
 | Model | Qwen/Qwen2.5-1.5B-Instruct (1.5B params, float32) |
 | Dataset | openai/gsm8k (~7.5 K grade-school math) |
-| Comm backend | **MPI** (`torch.distributed.init_process_group(backend="mpi")`) |
+| Communication backend | MPI |
 | Gradient sync | DDP automatic allreduce via MPI |
-| Runtime | `mpi-cuda-openmpi-benchmark` ClusterTrainingRuntime |
-| Image | `quay.io/ksuta/odh-mpi-cuda:0.0.14` |
+| Runtime | `openmpi-cuda-benchmark` |
+| Image | `quay.io/opendatahub/odh-training-cuda130-torch210-py312-openmpi41:odh-stable` |
 
 ### MPI communication patterns exercised
 
@@ -32,9 +32,9 @@ PyTorch DDP groups all gradients into a single flat buffer for the first trainin
 
 | File | Description |
 |------|-------------|
-| `train_sft_ddp.py` | Training script — SFT with DDP + MPI gradient allreduce |
-| `trainjob.yaml` | Kubeflow Trainer v2 TrainJob manifest |
-| `mpi-runtime.yaml` | ClusterTrainingRuntime for MPI + CUDA (OpenMPI) |
+| `train_sft_ddp.py` | PyTorch training script performing Supervised Fine-Tuning with DDP and MPI-based gradient synchronization |
+| `trainjob.yaml` | Kubeflow Trainer v2 `TrainJob` manifest defining the distributed training workload and parameters |
+| `mpi-runtime.yaml` | `ClusterTrainingRuntime` resource providing the OpenMPI + CUDA execution environment |
 
 ## Quick Start
 

diff --git a/...tv2-mpi-ddp-sft-ddp/1.5b/mpi-runtime.yaml → ...hmarks/kftv2-mpi-ddp-sft/mpi-runtime.yaml b/...tv2-mpi-ddp-sft-ddp/1.5b/mpi-runtime.yaml → ...hmarks/kftv2-mpi-ddp-sft/mpi-runtime.yaml
@@ -6,7 +6,7 @@
 apiVersion: trainer.kubeflow.org/v1alpha1
 kind: ClusterTrainingRuntime
 metadata:
-  name: mpi-cuda-openmpi-benchmark
+  name: openmpi-cuda-benchmark
   labels:
     trainer.kubeflow.org/framework: openmpi
 spec:
@@ -33,7 +33,7 @@ spec:
             template:
               spec:
                 containers:
-                - image: quay.io/ksuta/odh-mpi-cuda:0.0.14
+                - image: quay.io/opendatahub/odh-training-cuda130-torch210-py312-openmpi41:odh-stable
                   name: node
                   resources:
                     limits:
@@ -55,7 +55,7 @@ spec:
                   command:
                   - /usr/local/bin/uid_entrypoint.sh
                   - /usr/sbin/sshd
-                  image: quay.io/ksuta/odh-mpi-cuda:0.0.14
+                  image: quay.io/opendatahub/odh-training-cuda130-torch210-py312-openmpi41:odh-stable
                   name: node
                   readinessProbe:
                     initialDelaySeconds: 3

diff --git a/...tv2-mpi-ddp-sft-ddp/1.5b/train_sft_ddp.py → ...hmarks/kftv2-mpi-ddp-sft/train_sft_ddp.py b/...tv2-mpi-ddp-sft-ddp/1.5b/train_sft_ddp.py → ...hmarks/kftv2-mpi-ddp-sft/train_sft_ddp.py
diff --git a/.../kftv2-mpi-ddp-sft-ddp/1.5b/trainjob.yaml → benchmarks/kftv2-mpi-ddp-sft/trainjob.yaml b/.../kftv2-mpi-ddp-sft-ddp/1.5b/trainjob.yaml → benchmarks/kftv2-mpi-ddp-sft/trainjob.yaml
@@ -11,7 +11,7 @@ spec:
   runtimeRef:
     apiGroup: trainer.kubeflow.org
     kind: ClusterTrainingRuntime
-    name: mpi-cuda-openmpi-benchmark
+    name: openmpi-cuda-benchmark
   trainer:
     command:
     - /usr/local/bin/uid_entrypoint.sh