Ray: Create a Cluster

This example demonstrates how to launch a Ray cluster on OSMO. Ray is a unified framework for scaling AI and Python applications, and OSMO provides native support for running Ray clusters, making it easy to leverage Ray's distributed computing capabilities.

The workflow launches a Ray cluster with one master node and one or more worker nodes. The master node runs the Ray head process that coordinates the cluster, while worker nodes connect to the master to form a distributed compute cluster.

Running this workflow

curl -O https://raw.githubusercontent.com/NVIDIA/OSMO/main/cookbook/integration_and_tools/ray/ray.yaml
osmo workflow submit ray.yaml

You can customize the cluster size and resources:

osmo workflow submit ray.yaml --set num_nodes=4 --set gpu=2 --set cpu=16 --set memory=128Gi

Accessing the Ray Cluster

Port-forward the dashboard ports to access the Ray dashboard and Prometheus metrics:

# Get the workflow ID from the submit command output
osmo workflow port-forward <workflow-id> master --port 8265,9090

The Ray dashboard will be available at http://localhost:8265. The Prometheus dashboard will be available at http://localhost:9090.

Set the Ray address environment variable to use Ray CLI:

export RAY_ADDRESS="http://localhost:8265"

Best Practices

Resource Allocation: Ensure your resource requests match your workload requirements. Ray works best when it has accurate information about available resources.
Monitoring: Use the Ray dashboard to monitor cluster health, task progress, and resource utilization.
Port Configuration: The default Ray port (6376) can be customized using the ray_port parameter if needed.
Timeouts: Consider setting appropriate timeouts to manage cluster lifecycle and avoid unnecessary resource consumption.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ray: Create a Cluster

Running this workflow

Accessing the Ray Cluster

Best Practices

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Ray: Create a Cluster

Running this workflow

Accessing the Ray Cluster

Best Practices