-
Notifications
You must be signed in to change notification settings - Fork 1.9k
Description
Environment
-
How do you deploy Kubeflow Pipelines (KFP)?
Local cluster using kind -
KFP version:
- v2.5.0
Also tested on master branch and got same issue
- v2.5.0
-
KFP SDK version:
kfp 2.15.2
kfp-pipeline-spec 2.15.2
kfp-server-api 2.15.2
Kubernetes: Kind cluster v1.31.0 (Docker Desktop with WSL2 backend on Windows 11)
Steps to reproduce
Deploy Kubeflow Pipelines
export PIPELINE_VERSION=2.5.0
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/cluster-scoped-resources?ref=$PIPELINE_VERSION"
kubectl wait --for condition=established --timeout=60s crd/applications.app.k8s.io
kubectl apply -k "github.com/kubeflow/pipelines/manifests/kustomize/env/dev?ref=$PIPELINE_VERSION"
Observe multiple pods entering CrashLoopBackOff before eventually recovering with many restarts
Expected result
All pods should start successfully on first attempt without entering CrashLoopBackOff.
Materials and reference
I saw similar issues related to this such as: #6277, #11355, #12034
Root cause analysis:
Race conditions: Components like metadata-grpc start before MySQL is fully ready to accept connections. Similarly, metadata-writer starts before metadata-grpc, and ml-pipeline-persistenceagent / ml-pipeline-scheduledworkflow start before the ml-pipeline API server.
Lack of explicit dependency ordering: The manifests do not enforce startup dependencies, which allows components to start before their required services are ready.
Impacted by this bug? Give it a 👍.