[doc] Add KubeRay ecosystem

wlai2 · seanlaii · commit f3dc97aed51a · 2025-12-22T23:00:57.000-05:00
Signed-off-by: seanlaii &lt;qazwsx0939059006@gmail.com&gt;
diff --git a/content/en/docs/kuberay_on_volcano.md b/content/en/docs/kuberay_on_volcano.md
@@ -0,0 +1,124 @@
++++
+title =  "KubeRay on Volcano"
+
+date = 2025-12-22
+lastmod = 2025-12-22
+
+draft = false  # Is this a draft? true/false
+toc = true  # Show table of contents? true/false
+type = "docs"  # Do not modify.
+
+# Add menu entry to sidebar.
+linktitle = "KubeRay"
+[menu.docs]
+  parent = "ecosystem"
+  weight = 9
+
++++
+
+
+
+### KubeRay Introduction
+
+[Ray](https://docs.ray.io/en/latest/ray-overview/getting-started.html) is a unified distributed computing framework designed for AI/ML applications. Ray provides:
+
+- **Distributed Training**: Scale machine learning workloads from a single machine to thousands of nodes
+- **Hyperparameter Tuning**: Run parallel experiments with Ray Tune for efficient model optimization
+- **Distributed Data Processing**: Process large datasets with Ray Data for batch inference and data preprocessing
+- **Reinforcement Learning**: Train RL models at scale with Ray RLlib
+- **Serving**: Deploy and scale ML models in production with Ray Serve
+- **General Purpose Distributed Computing**: Build any distributed application with Ray Core APIs
+
+[KubeRay](https://docs.ray.io/en/latest/cluster/kubernetes/index.html) is an open-source Kubernetes operator that simplifies running Ray on Kubernetes. It provides automated deployment, scaling, and management of Ray clusters through Kubernetes-native tools and APIs.
+
+### KubeRay Integration with Volcano
+
+Starting with KubeRay v1.5.1, RayJob and RayCluster resources integrate with Volcano to support gang scheduling and network topology-aware scheduling. This integration enables more efficient resource allocation and improved performance for distributed AI/ML workloads.
+
+#### Supported Labels
+
+To configure RayJob and RayCluster resources with Volcano scheduling, you can use the following labels in the metadata section:
+
+| Label | Description | Required |
+|-------|-------------|----------|
+| `ray.io/priority-class-name` | Assigns a [Kubernetes](https://kubernetes.io/docs/concepts/scheduling-eviction/pod-priority-preemption/#priorityclass) priority class for pod scheduling | No |
+| `volcano.sh/queue-name` | Specifies the Volcano queue for resource submission | No |
+| `volcano.sh/network-topology-mode` | Configures network topology-aware scheduling mode | No |
+| `volcano.sh/network-topology-highest-tier-allowed` | Sets the highest network tier allowed for scheduling | No |
+
+Below are setup examples with detailed explanations. For comprehensive configuration options, please refer to the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano).
+
+#### Autoscaling Behavior
+
+KubeRay's integration with Volcano handles gang scheduling differently based on whether autoscaling is enabled:
+
+- **When autoscaling is enabled**: `minReplicas` is used for gang scheduling
+- **When autoscaling is disabled**: The desired replica count is used for gang scheduling
+
+This ensures that the gang scheduling constraints are properly maintained while allowing for flexible scaling behaviors based on your workload requirements.
+
+
+### Setup Guide
+
+#### Prerequisites
+
+##### 1. Create a Kubernetes Cluster
+```bash
+$ kind create cluster
+```
+
+##### 2. Install Volcano
+Follow the instructions in the [Volcano installation guide](https://volcano.sh/en/docs/installation/) to install Volcano on your Kubernetes cluster.
+
+##### 3. Install KubeRay Operator
+Deploy the KubeRay Operator with the `--batch-scheduler=volcano` flag to enable Volcano batch scheduling support:
+```bash
+$ helm install kuberay-operator kuberay/kuberay-operator --version 1.5.1 --set batchScheduler.name=volcano
+```
+
+#### Example Deployments
+
+##### RayCluster Example
+
+Deploy a RayCluster with Volcano scheduling:
+
+```bash
+# Download the sample RayCluster configuration with Volcano labels
+$ curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-cluster.volcano-scheduler.yaml
+
+# Apply the configuration
+$ kubectl apply -f ray-cluster.volcano-scheduler.yaml
+
+# Verify the RayCluster deployment
+$ kubectl get pod -l ray.io/cluster=test-cluster-0
+
+# Expected output:
+# NAME                                 READY   STATUS    RESTARTS   AGE
+# test-cluster-0-head-jj9bg            1/1     Running   0          36s
+```
+
+##### RayJob Example
+
+RayJob support with Volcano is available since KubeRay v1.5.1:
+
+```bash
+# Download the sample RayJob configuration with Volcano queue integration
+$ curl -LO https://raw.githubusercontent.com/ray-project/kuberay/v1.5.1/ray-operator/config/samples/ray-job.volcano-scheduler-queue.yaml
+
+# Apply the configuration
+$ kubectl apply -f ray-job.volcano-scheduler-queue.yaml
+
+# Monitor the job execution
+$ kubectl get pod
+
+# Expected output:
+# NAME                                             READY   STATUS      RESTARTS   AGE
+# rayjob-sample-0-k449j-head-rlgxj                 1/1     Running     0          93s
+# rayjob-sample-0-k449j-small-group-worker-c6dt8   1/1     Running     0          93s
+# rayjob-sample-0-k449j-small-group-worker-cq6xn   1/1     Running     0          93s
+# rayjob-sample-0-qmm8s                            0/1     Completed   0          32s
+```
+
+### Learn More
+
+For detailed configuration options, advanced scheduling strategies, network topology configurations, and best practices, visit the [KubeRay Volcano Scheduler Documentation](https://docs.ray.io/en/latest/cluster/kubernetes/k8s-ecosystem/volcano.html#kuberay-integration-with-volcano).