Skip to content

Commit 880544a

Browse files
authored
Add documentation for testing on Kubernetes (#23)
* add Dockerfile and docs, modify example * doc fixes
1 parent a84eb1c commit 880544a

File tree

3 files changed

+106
-4
lines changed

3 files changed

+106
-4
lines changed

docs/testing-on-k8s.md

+71
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# Testing in Kubernetes
2+
3+
This guide explains how to test DataFusion Ray on Kubernetes during development. It assumes you have an existing Kubernetes cluster.
4+
5+
## 1. Deploy the KubeRay Operator
6+
7+
To manage Ray clusters, you need to deploy the KubeRay operator using Helm. This step is required once per Kubernetes cluster.
8+
9+
```shell
10+
helm repo add kuberay https://ray-project.github.io/kuberay-helm/
11+
helm repo update
12+
13+
# Install the Custom Resource Definitions (CRDs) and KubeRay operator
14+
helm install kuberay-operator kuberay/kuberay-operator
15+
16+
# Verify that the operator is running in the `default` namespace.
17+
kubectl get pods
18+
19+
# Example output:
20+
# NAME READY STATUS RESTARTS AGE
21+
# kuberay-operator-7fbdbf8c89-pt8bk 1/1 Running 0 27s
22+
```
23+
24+
You can customize the operator's settings (e.g., resource limits and requests). For basic testing, the default configuration should suffice.
25+
For more details and customization options, refer to the [KubeRay Helm Chart documentation](https://github.com/ray-project/kuberay-helm/tree/main/helm-chart/kuberay-operator).
26+
27+
## 2. Build a Custom Docker Image
28+
You need to build a custom Docker image containing your local development copy of DataFusion Ray rather than using the default PyPi release.
29+
30+
Run the following command to build your Docker image:
31+
32+
```shell
33+
docker build -t [YOUR_IMAGE_NAME]:[YOUR_TAG] -f k8s/Dockerfile .
34+
```
35+
After building the image, push it to a container registry accessible by your Kubernetes cluster.
36+
37+
## 3. Deploy a RayCluster
38+
Next, deploy a RayCluster using the custom image.
39+
40+
```shell
41+
helm repo update
42+
helm install datafusion-ray kuberay/ray-cluster \
43+
--set 'image.repository=[YOUR_REPOSITORY]' \
44+
--set 'image.tag=[YOUR_TAG]' \
45+
--set 'imagePullPolicy=Always'
46+
```
47+
Make sure you replace *[YOUR_REPOSITORY]* and *[YOUR_TAG]* with your actual container registry and image tag values.
48+
49+
You can further customize RayCluster settings (such as resource allocations, autoscaling, and more).
50+
For full configuration options, refer to the [RayCluster Helm Chart documentation](https://github.com/ray-project/kuberay-helm/tree/main/helm-chart/ray-cluster).
51+
52+
## 4. Port Forwarding
53+
54+
To access Ray's dashboard, set up port forwarding between your local machine and the Ray cluster's head node:
55+
56+
```shell
57+
kubectl port-forward service/raycluster-kuberay-head-svc 8265:8265
58+
```
59+
60+
This makes Ray’s dashboard and API available at `http://127.0.0.1:8265`.
61+
62+
63+
## 5. Run an Example
64+
From the examples directory in your project, you can run a sample job using the following commands:
65+
66+
```
67+
export RAY_ADDRESS="http://127.0.0.1:8265"
68+
ray job submit --working-dir ./examples/ -- python3 tips.py
69+
```
70+
71+
### Expected output:

examples/tips.py

+1-4
Original file line numberDiff line numberDiff line change
@@ -22,11 +22,8 @@
2222

2323
SCRIPT_DIR = os.path.dirname(os.path.realpath(__file__))
2424

25-
# Start a local cluster
26-
ray.init(resources={"worker": 1})
27-
2825
# Connect to a cluster
29-
# ray.init()
26+
ray.init()
3027

3128
# Create a context and register a table
3229
ctx = DatafusionRayContext(2)

k8s/Dockerfile

+34
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,34 @@
1+
FROM rayproject/ray:2.37.0.cabc24-py312
2+
3+
RUN sudo apt update && \
4+
sudo apt install -y curl build-essential
5+
6+
# Intall Rust
7+
RUN curl https://sh.rustup.rs -sSf | sh -s -- --default-toolchain stable -y
8+
9+
WORKDIR /home/ray
10+
11+
# install dependencies
12+
COPY requirements-in.txt /home/ray/
13+
RUN python3 -m venv venv && \
14+
source venv/bin/activate && \
15+
pip3 install -r requirements-in.txt
16+
17+
# add sources
18+
RUN mkdir /home/ray/src
19+
RUN mkdir /home/ray/datafusion_ray
20+
COPY src /home/ray/src/
21+
COPY datafusion_ray /home/ray/datafusion_ray/
22+
COPY pyproject.toml /home/ray/
23+
COPY Cargo.* /home/ray/
24+
COPY build.rs /home/ray/
25+
COPY README.md /home/ray/
26+
27+
# build datafusion_ray
28+
RUN source venv/bin/activate && \
29+
source /home/ray/.cargo/env && \
30+
maturin build --release
31+
32+
FROM rayproject/ray:2.37.0.cabc24-py312
33+
COPY --from=0 /home/ray/target/wheels/datafusion_ray-0.6.0-cp38-abi3-manylinux_2_35_x86_64.whl /home/ray/datafusion_ray-0.6.0-cp38-abi3-manylinux_2_35_x86_64.whl
34+
RUN pip3 install /home/ray/datafusion_ray-0.6.0-cp38-abi3-manylinux_2_35_x86_64.whl

0 commit comments

Comments
 (0)