Skip to content

Current namespace not used when creating daskcluster in k8s #921

Open
@john-jam

Description

@john-jam

Describe the issue:

When using a dask operator deployment in k8s with the role/rolebinding defined at the namespace level (rbac.cluster: false), the creation of a daskclusters.kubernetes.dask.org by a service account (dask in the example) inside a pod within a namespace (myns in the example) leads to the following error:

Short Error Message:

User "system:serviceaccount:myns:dask" cannot create resource "daskclusters" in API group "kubernetes.dask.org" in the namespace "default"
...
User "system:serviceaccount:myns:dask" cannot list resource "daskclusters" in API group "kubernetes.dask.org" in the namespace "default"

Full Stacktrace:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/kr8s/_api.py", line 168, in call_api
    response.raise_for_status()
  File "/usr/local/lib/python3.11/site-packages/httpx/_models.py", line 829, in raise_for_status
    raise HTTPStatusError(message, request=request, response=self)
httpx.HTTPStatusError: Client error '403 Forbidden' for url 'https://.../apis/kubernetes.dask.org/v1/namespaces/default/daskclusters'
For more information check: https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/403

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/prefect/engine.py", line 42, in <module>
    run_flow(flow, flow_run=flow_run)
  File "/usr/local/lib/python3.11/site-packages/prefect/flow_engine.py", line 1453, in run_flow
    return run_flow_sync(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/flow_engine.py", line 1333, in run_flow_sync
    return engine.state if return_type == "state" else engine.result()
                                                       ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/flow_engine.py", line 313, in result
    raise self._raised
  File "/usr/local/lib/python3.11/site-packages/prefect/flow_engine.py", line 721, in run_context
    yield self
  File "/usr/local/lib/python3.11/site-packages/prefect/flow_engine.py", line 1331, in run_flow_sync
    engine.call_flow_fn()
  File "/usr/local/lib/python3.11/site-packages/prefect/flow_engine.py", line 744, in call_flow_fn
    result = call_with_parameters(self.flow.fn, self.parameters)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/prefect/utilities/callables.py", line 206, in call_with_parameters
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/workdir/examples/cs/flows/misc/run_on_dask/flow.py", line 43, in run_on_dask
    cluster = KubeCluster(
              ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 282, in __init__
    self.sync(self._start)
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 363, in sync
    return sync(
           ^^^^^
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 439, in sync
    raise error
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 413, in f
    result = yield future
             ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tornado/gen.py", line 766, in run
    value = future.result()
            ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 322, in _start
    await self._create_cluster()
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 361, in _create_cluster
    await cluster.create()
  File "/usr/local/lib/python3.11/site-packages/kr8s/_objects.py", line 320, in create
    async with self.api.call_api(
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kr8s/_api.py", line 186, in call_api
    raise ServerError(
kr8s._exceptions.ServerError: daskclusters.kubernetes.dask.org is forbidden: User "system:serviceaccount:myns:dask" cannot create resource "daskclusters" in API group "kubernetes.dask.org" in the namespace "default"
Exception ignored in atexit callback: <function reap_clusters at 0x7afb85dafe20>
Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 1033, in reap_clusters
    asyncio.run(_reap_clusters())
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/asyncio/base_events.py", line 654, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 1031, in _reap_clusters
    cluster.close(timeout=10)
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 700, in close
    return self.sync(self._close, timeout=timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 363, in sync
    return sync(
           ^^^^^
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 439, in sync
    raise error
  File "/usr/local/lib/python3.11/site-packages/distributed/utils.py", line 413, in f
    result = yield future
             ^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/tornado/gen.py", line 766, in run
    value = future.result()
            ^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/dask_kubernetes/operator/kubecluster/kubecluster.py", line 706, in _close
    cluster = await DaskCluster.get(self.name, namespace=self.namespace)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kr8s/_objects.py", line 265, in get
    raise e
  File "/usr/local/lib/python3.11/site-packages/kr8s/_objects.py", line 255, in get
    resources = await api.async_get(
                ^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kr8s/_api.py", line 460, in async_get
    async with self.async_get_kind(
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kr8s/_api.py", line 396, in async_get_kind
    async with self.call_api(
  File "/usr/local/lib/python3.11/contextlib.py", line 210, in __aenter__
    return await anext(self.gen)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/kr8s/_api.py", line 186, in call_api
    raise ServerError(
kr8s._exceptions.ServerError: daskclusters.kubernetes.dask.org "test-cluster" is forbidden: User "system:serviceaccount:myns:dask" cannot list resource "daskclusters" in API group "kubernetes.dask.org" in the namespace "default"

Minimal Complete Verifiable Example:

Running this inside a pod:

from dask_kubernetes.operator.kubecluster.kubecluster import KubeCluster, make_cluster_spec

if __name__ == '__main__':
    spec = make_cluster_spec(
        name="test-cluster",
    )
    cluster = KubeCluster(
        custom_cluster_spec=spec,
    )

    cluster.adapt(minimum=0, maximum=2)

Anything else we need to know?:

When running the exact same test with 2024.5.0 version, it works fine so I think this is due to an update made in the 2024.8.0 release since it does not work since this version.

To make this work with 2024.8.0 or later, I need to define the namespace option when instantiating the KubeCluster (but I don't know the ns in advance in my use case):

Environment:

  • Dask version: 2024.11.2
  • Python version: 3.11.9
  • Operating System: ubuntu 22.04
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    needs infoNeeds further information from the user

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions