Skip to content

Commit 73f38cb

Browse files
authored
fix gpu assignment in local-cluster mode and bash cli's (#973)
Fixes gpu assignment in local-cluster mode (used in test of no-code change cli's) to match what is done local mode. This gpu assignment error was silently failing in the cli runs in run_test.sh in CI runs because the cli's weren't propagating Spark errors to their exit codes. This PR also fixes this issue in the clis by setting `check=True` in the Python `subprocess.run` invocations and adds a test to check the Spark errors are propagated correctly. Signed-off-by: Erik Ordentlich <[email protected]>
1 parent dcb55da commit 73f38cb

File tree

4 files changed

+13
-3
lines changed

4 files changed

+13
-3
lines changed

python/run_test.sh

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,10 +28,20 @@ fi
2828
python -m spark_rapids_ml tests_no_import_change/test_no_import_change.py 0.2
2929
# runs on cpu
3030
python tests_no_import_change/test_no_import_change.py 0.2
31+
3132
# runs on gpu with spark-submit (note: local[1] and pyspark<3.5.6 for spark-rapids-submit hangs probably due to barrier rdd timer threads. TBD root cause)
3233
pip install pyspark==3.5.6
3334
spark-rapids-submit --master local-cluster[1,1,1024] tests_no_import_change/test_no_import_change.py 0.2
35+
# test that failure mode returns non-zero exit code
36+
set +e
37+
spark-rapids-submit --master local-cluster[1,1,1024] tests_no_import_change/test_no_import_change.py -0.2
38+
if [ $? -eq 0 ]; then
39+
echo "test should have returned non-zero exit code"
40+
exit 1
41+
fi
42+
set -e
3443
pip install -r requirements_dev.txt
44+
3545
# runs on cpu with spark-submit
3646
spark-submit --master local-cluster[1,1,1024] tests_no_import_change/test_no_import_change.py 0.2
3747

python/src/spark_rapids_ml/pyspark_rapids.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -41,4 +41,4 @@ def main_cli() -> None:
4141
command_line = "pyspark " + " ".join(sys.argv[1:])
4242
env = dict(os.environ)
4343
env["PYTHONSTARTUP"] = f"{spark_rapids_ml.__path__[0]}/install.py"
44-
subprocess.run(command_line, shell=True, env=env)
44+
subprocess.run(command_line, shell=True, env=env, check=True)

python/src/spark_rapids_ml/spark_rapids_submit.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -46,4 +46,4 @@ def main_cli() -> None:
4646
+ " ".join(sys.argv[i:])
4747
)
4848

49-
subprocess.run(command_line, shell=True)
49+
subprocess.run(command_line, shell=True, check=True)

python/src/spark_rapids_ml/utils.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,7 +96,7 @@ def _get_spark_session() -> SparkSession:
9696

9797
def _is_local(sc: SparkContext) -> bool:
9898
"""Whether it is Spark local mode"""
99-
return sc._jsc.sc().isLocal() # type: ignore
99+
return sc._jsc.sc().isLocal() or sc.getConf().get("spark.master").startswith("local-cluster") # type: ignore
100100

101101

102102
def _is_standalone_or_localcluster(conf: SparkConf) -> bool:

0 commit comments

Comments
 (0)