Skip to content

Controller loses API connection after token expiry on Azure Kubernetes Service (AKS) #964

@waqas-anonymco

Description

@waqas-anonymco

dask-operator sometimes fails to create Dask Jobs on Azure Kubernetes Service (AKS) 1.33.5:

Related issues: #913

It looks like issue is not completely fixed. We have seen two occurrences of this issues in last two weeks.

Also, see nolar/kopf#980 (comment) report for details.

Logs:

[2025-12-01 22:20:46,800] kr8s._auth           [DEBUG   ] Reloading credentials
...
[2025-12-01 22:20:46,822] httpcore.http11      [DEBUG   ] send_request_headers.started request=<Request [b'GET']>
[2025-12-01 22:20:46,822] httpcore.http11      [DEBUG   ] send_request_headers.complete
...
[2025-12-01 22:20:46,901] httpx                [INFO    ] HTTP Request: GET https://10.0.0.1/apis/kubernetes.dask.org/v1/namespaces/join/daskclusters/job-d90c9c3c-0ae2-48aa-ba00-c80da3bce657 "HTTP/1.1 401 Unauthorized"

Minimal Complete Verifiable Example:

Install dask-operator on AKS 1.33.5
Wait an hour for the authentication token to expire.
Create a DaskJob resource.

dask-operator will not create the DaskJob because dask-operator's kubernetes authentication token has expired and kopf's watchers are no longer connected to kubeapi. A bug in kopf prevents kopf from refreshing the authentication token.

This only occurs on AKS 1.30+ because that is AKS now sets --service-account-extend-token-expiration to false.

Environment:
Dask operator version: 2025.7.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions