-
-
Notifications
You must be signed in to change notification settings - Fork 156
Open
Labels
Description
dask-operator sometimes fails to create Dask Jobs on Azure Kubernetes Service (AKS) 1.33.5:
Related issues: #913
It looks like issue is not completely fixed. We have seen two occurrences of this issues in last two weeks.
Also, see nolar/kopf#980 (comment) report for details.
Logs:
[2025-12-01 22:20:46,800] kr8s._auth [DEBUG ] Reloading credentials
...
[2025-12-01 22:20:46,822] httpcore.http11 [DEBUG ] send_request_headers.started request=<Request [b'GET']>
[2025-12-01 22:20:46,822] httpcore.http11 [DEBUG ] send_request_headers.complete
...
[2025-12-01 22:20:46,901] httpx [INFO ] HTTP Request: GET https://10.0.0.1/apis/kubernetes.dask.org/v1/namespaces/join/daskclusters/job-d90c9c3c-0ae2-48aa-ba00-c80da3bce657 "HTTP/1.1 401 Unauthorized"
Minimal Complete Verifiable Example:
Install dask-operator on AKS 1.33.5
Wait an hour for the authentication token to expire.
Create a DaskJob resource.
dask-operator will not create the DaskJob because dask-operator's kubernetes authentication token has expired and kopf's watchers are no longer connected to kubeapi. A bug in kopf prevents kopf from refreshing the authentication token.
This only occurs on AKS 1.30+ because that is AKS now sets --service-account-extend-token-expiration to false.
Environment:
Dask operator version: 2025.7.0
creste