-
Notifications
You must be signed in to change notification settings - Fork 13
Open
Description
I'm not sure if I'm doing something wrong (and I certainly don't understand the relation between ProxyCluster and LocalCluster), but running
daskctl cluster create -f dask_ctl/tests/specs/simple.yaml
daskctl cluster listdoes not yield anything, and I also can't find any trace of the cluster process on my system. For reference, this works (even though there are some KeyError: 'register-client' errors:
In [2]: with distributed.LocalCluster(name="testcluster", scheduler_port=8786) as _:
...: !daskctl cluster list
...:
Name Address Type Discovery Workers Threads Memory Created Status
──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
testcluster tcp://localhost:8786 dask_ctl.proxy.ProxyCluster proxycluster 4 12 30.57 GiB Just now RunningSo I suspect shutdown_on_close = False does not work the way I thought it would.
Even worse, creating the cluster using the CLI multiple times in a row (usually between 3-5 times over a relatively short time) results in daskctl crashing with a python core dump (first time I've seen that). Not sure what went wrong, or whether it would be better to report that to distributed instead.
log of the crash
distributed.diskutils - INFO - Found stale lock file and directory '$PWD/dask-worker-space/worker-hdmzw82z', purging
distributed.diskutils - INFO - Found stale lock file and directory '$PWD/dask-worker-space/worker-0snl7k9p', purging
distributed.diskutils - INFO - Found stale lock file and directory '$PWD/dask-worker-space/worker-nb9_z2rd', purging
distributed.diskutils - INFO - Found stale lock file and directory '$PWD/dask-worker-space/worker-kbdgg6l1', purging
Created cluster 5fc5d1cd.
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Traceback (most recent call last):
File ".../lib/python3.9/threading.py", line 973, in _bootstrap_inner
self.run()
File ".../lib/python3.9/threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File ".../lib/python3.9/site-packages/distributed/process.py", line 218, in _watch_process
assert exitcode is not None
AssertionError
Exception in thread AsyncProcess Dask Worker process (from Nanny) watch process join:
Traceback (most recent call last):
Fatal Python error: _enter_buffered_busy: could not acquire lock for <_io.BufferedWriter name='<stderr>'> at interpreter shutdown, possibly due to daemon threads
Python runtime state: finalizing (tstate=0x55d38b388740)
Current thread 0x00007f92d5b95740 (most recent call first):
<no Python frame>
Aborted (core dumped)Edit: versions:
python: 3.9
dask: 2022.2.1
distributed: 2022.2.1
dask-ctl: 2022.2.1+1.g749a65a
Metadata
Metadata
Assignees
Labels
No labels