Skip to content

Commit 0c9e540

Browse files
committed
Allow assign memory to RayDPSparkMaster actor (#448)
* only set spark master memory instead * add test * revert changes not needed * update ghaction * change parent project
1 parent 8bee5ae commit 0c9e540

File tree

11 files changed

+2614
-11
lines changed

11 files changed

+2614
-11
lines changed

.github/workflows/pypi.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ permissions: # added using https://github.com/step-security/secure-repo
2929
jobs:
3030
build-and-publish:
3131
# do not run in forks
32-
if: ${{ github.repository_owner == 'oap-project' }}
32+
if: ${{ github.repository_owner == 'ray-project' }}
3333
name: build wheel and upload
3434
runs-on: ubuntu-latest
3535
steps:

.github/workflows/ray_nightly_test.yml

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -84,10 +84,10 @@ jobs:
8484
fi
8585
case $PYTHON_VERSION in
8686
3.9)
87-
pip install "ray[train] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp39-cp39-manylinux2014_x86_64.whl"
87+
pip install "ray[train,default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp39-cp39-manylinux2014_x86_64.whl"
8888
;;
8989
3.10.14)
90-
pip install "ray[train] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
90+
pip install "ray[train,default] @ https://s3-us-west-2.amazonaws.com/ray-wheels/latest/ray-3.0.0.dev0-cp310-cp310-manylinux2014_x86_64.whl"
9191
;;
9292
esac
9393
pip install pyarrow tqdm pytest tensorflow==2.13.1 tabulate grpcio-tools wget

.github/workflows/raydp.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,7 +83,7 @@ jobs:
8383
else
8484
pip install torch
8585
fi
86-
pip install pyarrow "ray[train]==${{ matrix.ray-version }}" tqdm pytest tensorflow==2.13.1 tabulate grpcio-tools wget
86+
pip install pyarrow "ray[train,default]==${{ matrix.ray-version }}" tqdm pytest tensorflow==2.13.1 tabulate grpcio-tools wget
8787
pip install "xgboost_ray[default]<=0.1.13"
8888
pip install "xgboost<=2.0.3"
8989
pip install torchmetrics

.github/workflows/raydp_nightly.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ permissions: # added using https://github.com/step-security/secure-repo
2929
jobs:
3030
build-and-publish:
3131
# do not run in forks
32-
if: ${{ github.repository_owner == 'oap-project' }}
32+
if: ${{ github.repository_owner == 'ray-project' }}
3333
name: build wheel and upload
3434
runs-on: ubuntu-latest
3535
steps:

README.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -91,8 +91,8 @@ Spark features such as dynamic resource allocation, spark-submit script, etc are
9191
## Spark + AI Pipeline on Ray
9292

9393
RayDP provides APIs for converting a Spark DataFrame to a Ray Dataset which can be consumed by XGBoost, Ray Train, Horovod on Ray, etc. RayDP also provides high level scikit-learn style Estimator APIs for distributed training with PyTorch or Tensorflow. To get started with end-to-end Spark + AI pipeline, the easiest way is to run the following tutorials on Google Collab. More examples are also available in the `examples` folder.
94-
* [Spark + Ray Train Tutorial on Google Collab](https://colab.research.google.com/github/oap-project/raydp/blob/master/tutorials/raytrain_example.ipynb)
95-
* [Spark + TorchEstimator Tutorial on Google Collab](https://colab.research.google.com/github/oap-project/raydp/blob/master/tutorials/pytorch_example.ipynb)
94+
* [Spark + Ray Train Tutorial on Google Collab](https://colab.research.google.com/github/ray-project/raydp/blob/master/tutorials/raytrain_example.ipynb)
95+
* [Spark + TorchEstimator Tutorial on Google Collab](https://colab.research.google.com/github/ray-project/raydp/blob/master/tutorials/pytorch_example.ipynb)
9696

9797

9898
***Spark DataFrame & Ray Dataset conversion***

core/pom.xml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
<packaging>pom</packaging>
1111

1212
<name>RayDP Parent Pom</name>
13-
<url>https://github.com/oap-project/raydp.git</url>
13+
<url>https://github.com/ray-project/raydp.git</url>
1414

1515
<properties>
1616
<spark.version>3.3.3</spark.version>

python/raydp/spark/ray_cluster.py

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -61,8 +61,15 @@ def _set_up_master(self, resources: Dict[str, float], kwargs: Dict[Any, Any]):
6161
if "CPU" in resources:
6262
num_cpu = resources["CPU"]
6363
resources.pop("CPU", None)
64+
65+
memory = None
66+
if "memory" in resources:
67+
memory = resources["memory"]
68+
resources.pop("memory", None)
69+
6470
self._spark_master_handle = RayDPSparkMaster.options(name=spark_master_name,
6571
num_cpus=num_cpu,
72+
memory=memory,
6673
resources=resources) \
6774
.remote(self._configs)
6875
else:
Lines changed: 65 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,65 @@
1+
import sys
2+
import pytest
3+
import ray
4+
import raydp
5+
from ray.cluster_utils import Cluster
6+
from ray.util.state import list_actors
7+
8+
9+
def test_spark_master_memory_custom(jdk17_extra_spark_configs):
10+
cluster = Cluster(
11+
initialize_head=True,
12+
head_node_args={
13+
"num_cpus": 2,
14+
"resources": {"master": 10},
15+
"include_dashboard": True,
16+
"dashboard_port": 8270,
17+
},
18+
)
19+
ray.init(address=cluster.address,
20+
dashboard_port=cluster.head_node.dashboard_grpc_port,
21+
include_dashboard=True)
22+
23+
custom_memory = 100 * 1024 * 1024 # 100MB in bytes
24+
configs = jdk17_extra_spark_configs.copy()
25+
# Config under test: set Spark Master actor memory via RayDP config
26+
configs["spark.ray.raydp_spark_master.actor.resource.memory"] = str(custom_memory)
27+
# Also require the master custom resource so the actor is scheduled on the head
28+
configs["spark.ray.raydp_spark_master.actor.resource.master"] = "1"
29+
30+
app_name = "test_spark_master_memory_custom"
31+
32+
spark = raydp.init_spark(
33+
app_name=app_name,
34+
num_executors=1,
35+
executor_cores=1,
36+
executor_memory="500M",
37+
configs=configs,
38+
)
39+
40+
# Trigger the Spark master / RayDPSparkMaster startup
41+
spark.createDataFrame([(1, 2)], ["a", "b"]).count()
42+
43+
# RayDPSparkMaster name is app_name + RAYDP_SPARK_MASTER_SUFFIX
44+
master_actor_name = f"{app_name}_SPARK_MASTER"
45+
46+
actor = ray.get_actor(master_actor_name)
47+
assert actor is not None
48+
49+
# Query Ray state for this actor
50+
actor_state = list_actors(filters=[("actor_id", "=", actor._actor_id.hex())], detail=True)[0]
51+
resources = actor_state.required_resources
52+
53+
assert resources["memory"] == custom_memory
54+
assert resources["master"] == 1
55+
56+
spark.stop()
57+
raydp.stop_spark()
58+
ray.shutdown()
59+
cluster.shutdown()
60+
61+
62+
if __name__ == "__main__":
63+
sys.exit(pytest.main(["-v", __file__]))
64+
65+

python/setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -108,7 +108,7 @@ def run(self):
108108
author="RayDP Developers",
109109
author_email="raydp-dev@googlegroups.com",
110110
license="Apache 2.0",
111-
url="https://github.com/oap-project/raydp",
111+
url="https://github.com/ray-project/raydp",
112112
keywords="raydp spark ray distributed data-processing",
113113
description="RayDP: Distributed Data Processing on Ray",
114114
long_description=io.open(

tutorials/pytorch_example.ipynb

Lines changed: 1156 additions & 1 deletion
Large diffs are not rendered by default.

0 commit comments

Comments
 (0)