Skip to content

Foundry transform fails on Mac #107

@Waschenbacher

Description

@Waschenbacher

Issue checklist

  • This is not a bug or a feature/enhancement request.
  • I searched through the GitHub issues and this issue has not been opened before.

Issue

How to reproduce:

Run the dummy transform on my local machine failed with spark driver binding error, see below.

java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
	at java.base/sun.nio.ch.Net.bind0(Native Method)
...
...
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/foundry_dev_tools/utils/caches/spark_caches.py", line 272, in _read_parquet
    return get_spark_session().read.format("parquet").load(os.fspath(path.joinpath("*")))
           ^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/foundry_dev_tools/utils/spark.py", line 26, in get_spark_session
    .getOrCreate()
     ^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/pyspark/sql/session.py", line 497, in getOrCreate
    sc = SparkContext.getOrCreate(sparkConf)
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/pyspark/context.py", line 515, in getOrCreate
    SparkContext(conf=conf or SparkConf())
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/pyspark/context.py", line 203, in __init__
    self._do_init(
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/pyspark/context.py", line 296, in _do_init
    self._jsc = jsc or self._initialize_context(self._conf._jconf)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/pyspark/context.py", line 421, in _initialize_context
    return self._jvm.JavaSparkContext(jconf)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/py4j/java_gateway.py", line 1587, in __call__
    return_value = get_return_value(
                   ^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Caskroom/mambaforge/base/envs/.../lib/python3.11/site-packages/py4j/protocol.py", line 326, in get_return_value
    raise Py4JJavaError(
py4j.protocol.Py4JJavaError: An error occurred while calling None.org.apache.spark.api.java.JavaSparkContext.
: java.net.BindException: Can't assign requested address: Service 'sparkDriver' failed after 16 retries (on a random free port)! Consider explicitly setting the appropriate binding address for the service 'sparkDriver' (for example spark.driver.bindAddress for SparkDriver) to the correct binding address.
	at java.base/sun.nio.ch.Net.bind0(Native Method)

Pseudo-code:

# import os
from pyspark.sql import DataFrame
from transforms.api import Input, Output, transform_df

from myproject.datasets.utils import code_runs_windows_or_macOS

# Fix Spark driver binding issue on Mac
# os.environ.setdefault("SPARK_LOCAL_IP", "127.0.0.1")
# os.environ.setdefault("SPARK_DRIVER_BIND_ADDRESS", "127.0.0.1")

@transform_df(
    Output("[OUTPUT_PATH]"),
    input_df=Input("[INPUT_PATH]"),
)
def dummy_transform(input_df: DataFrame) -> DataFrame:
    """
    """
    return input_df


if __name__ == "__main__":
    df = dummy_transform.compute()
    print("done")

How did I solve it for now:

I would like to report it incase others encounter the same issue. Specifying the environment variables (e.g. uncomment the code lines in the pseudo code) helps me solving the problem. There should be other more elegant solutions. Glad to hear opinions from experts.

System:
MacOS sequoia 15.6.1
Chip: Apple M1 Pro

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions