Description
I am trying to run a multi-node, multi-host SSH cluster on Windows. I simplified it, for now, attempting to run both the scheduler and the workers on localhost. Based on the Dask documentation instructions, I setup public key SSH access, in this case, from localhost to localhost. Encountered this issue and fixed it by the recommended fix in the same link. Then encountered the next issue, which has to do with trying to run a command which is over the character limit imposed by Windows.
set_env = "set DASK_INTERNAL_INHERIT_CONFIG={} &&".format(
dask.config.serialize(dask.config.global_config)
)
The above line from the "distributed\deploy\ssh.py", generates a string of 9000+ chars. Which seems to be a problem.
The next line of code creates the command "cmd", and the following line starts the process:
self.proc = await self.connection.create_process(cmd)
and the below line extracts this error - 'The command line is too long.\r\n':
line = await self.proc.stderr.readline()
In an attempt to reduce the size of the serialized config, I have tried removing the Kubernetes key from the dask.config.global_config, and re-adding it with an empty dict as value, thinking I should not need Kubernetes, since I am using the SSHCluster and not KubeCluster. When serializing the config, the length is less than the limit, and sure enough, I seem to get past the 'The command line is too long' error but get stuck with the below error instead:
2023-08-28 21:10:06,883 - distributed.deploy.ssh - INFO - raise JSONDecodeError("Expecting value", s, err.value) from None
2023-08-28 21:10:06,883 - distributed.deploy.ssh - INFO - json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
I am using Windows right now, and am considering installing a Linux VM to try this out. Was wondering if anyone has had this issue with Windows and what can be done to workaround it?
This is the code I am using in the main module:
import dask
from dask.distributed import Client, SSHCluster
cluster = SSHCluster(["localhost", "localhost"],
connect_options={"known_hosts": None},
worker_options={"n_workers": 10},
scheduler_options={"port": 0, "dashboard_address": ":8797"})
client = Client(cluster)
Environment:
- Dask version 2023.8.1
- Python version 3.11.2
- OS: Windows 10
- Installed via Pip