Closed
Description
Describe the issue:
Hi
We use Flyte task to deploy DaskJob, and we encountered an issue that the runner sometimes was missing while the cluster was reported an error 'already exists'
it looks like the await call got an exception 'already exits' at this line
Since the dask cluster already exists, why don't we handle the exception 'already exists' for creating Dask cluster, and then continue to create the runner?
Error message:
Handler 'daskjob_create_components/status.jobStatus' failed with an exception. Will retry. Traceback (most recent call last): File "/usr/local/lib/python3.10/site-packages/kr8s/_api.py", line 168, in call_api response.raise_for_status() File "/usr/local/lib/python3.10/site-packages/httpx/_models.py", line 763, in raise_for_status raise HTTPStatusError(message, request=request, response=self) httpx.HTTPStatusError: Client error '409 Conflict' for url '[https://10.0.0.1/apis/kubernetes.dask.org/v1..."](https://10.0.0.1/apis/kubernetes.dask.org/v1...%22), line 774, in daskjob_create_components await cluster.create() File "/usr/local/lib/python3.10/site-packages/kr8s/_objects.py", line 320, in create async with self.api.call_api( File "/usr/local/lib/python3.10/contextlib.py", line 199, in __aenter__ return await anext(self.gen) File "/usr/local/lib/python3.10/site-packages/kr8s/_api.py", line 186, in call_api raise ServerError( kr8s._exceptions.ServerError: daskclusters.kubernetes.dask.org "fn2oaqa4432x5o-n0-0-dn7-0" already exists
Environment:
- Dask version: 2024.10.0
- Python version: 3.12
- Operating System:
- Install method (conda, pip, source):