Description
Edit: This happened due to Docker Hub rate limits (see #265 (comment))
Checks
- I have searched the existing issues.
- My issue is related to one of the components in the
kubeflow/notebooks
repository.
Kubeflow Notebooks Version
1.10
Kubeflow Platform
Charmed Kubeflow 1.10
Kubernetes Distribution
MicroK8s v1.32.3 revision 8148
Kubernetes Version
Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3
Description
I was following the official guide (link) to integrate Charmed Kubeflow and Charmed MLflow.
When I created a notebook using the image kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1
, the pod failed to start and encountered the following error:
ImagePullBackOff: Back-off pulling image "kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1": ErrImagePull: rpc error: code = FailedPrecondition desc = failed to pull and unpack image "docker.io/kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1": failed commit on ref "manifest-sha256:7851e1a90c95abd2a727cf2b71eaaf756c82291a58aa069c7e2c8e0fd83b5d6a": "manifest-sha256:7851e1a90c95abd2a727cf2b71eaaf756c82291a58aa069c7e2c8e0fd83b5d6a" failed size validation: 442244 != 4860: failed precondition
The error message indicates a failure during the image pull and unpack process, specifically "failed size validation: 442244 != 4860"
. It appears the pod is unable to validate the image correctly after pulling it.
I suspect this might be an issue related to Kubeflow Notebooks, which is why I'm posting it here. However, if this issue should be reported elsewhere, please let me know the appropriate place.
Relevant Logs
microk8s kubectl describe pods test-notebook-0 -n admin
Name: test-notebook-0
Namespace: admin
Priority: 0
Service Account: default-editor
Node: calculon15/192.168.50.33
Start Time: Fri, 25 Apr 2025 08:05:18 +0000
Labels: access-minio=true
access-ml-pipeline=true
app=test-notebook
apps.kubernetes.io/pod-index=0
controller-revision-hash=test-notebook-fd97744f6
mlflow-server-minio=true
notebook-name=test-notebook
security.istio.io/tlsMode=istio
service.istio.io/canonical-name=test-notebook
service.istio.io/canonical-revision=latest
statefulset=test-notebook
statefulset.kubernetes.io/pod-name=test-notebook-0
Annotations: cni.projectcalico.org/containerID: f9d40694ca8f3585b55baf24ec33361b06be4d544995ccaae6a30bb281e2f2cc
cni.projectcalico.org/podIP: 10.1.45.203/32
cni.projectcalico.org/podIPs: 10.1.45.203/32
istio.io/rev: default
kubectl.kubernetes.io/default-container: test-notebook
kubectl.kubernetes.io/default-logs-container: test-notebook
poddefault.admission.kubeflow.org/poddefault-access-ml-pipeline: 6220594
poddefault.admission.kubeflow.org/poddefault-mlflow-server-access-minio: 6378115
poddefault.admission.kubeflow.org/poddefault-mlflow-server-minio: 6378116
prometheus.io/path: /stats/prometheus
prometheus.io/port: 15020
prometheus.io/scrape: true
sidecar.istio.io/status:
{"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-env...
Status: Pending
IP: 10.1.45.203
IPs:
IP: 10.1.45.203
Controlled By: StatefulSet/test-notebook
Init Containers:
istio-init:
Container ID: containerd://34eabed85bd3e06ce2dd36546834d78dbc3962e7634c57f56f8994439b660339
Image: docker.io/istio/proxyv2:1.24.2
Image ID: docker.io/istio/proxyv2@sha256:445156b5f4a780242d079a47b7d88199cbbb5959c92358469b721af490eca1ae
Port: <none>
Host Port: <none>
Args:
istio-iptables
-p
15001
-z
15006
-u
1337
-m
REDIRECT
-i
*
-x
-b
*
-d
15090,15021,15020
--log_output_level=default:info
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 25 Apr 2025 08:05:20 +0000
Finished: Fri, 25 Apr 2025 08:05:20 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pjqff (ro)
Containers:
test-notebook:
Container ID:
Image: kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1
Image ID:
Port: 8888/TCP
Host Port: 0/TCP
State: Waiting
Reason: ImagePullBackOff
Ready: False
Restart Count: 0
Limits:
cpu: 600m
memory: 1288490188800m
Requests:
cpu: 500m
memory: 1Gi
Environment:
NB_PREFIX: /notebook/admin/test-notebook
KF_PIPELINES_SA_TOKEN_PATH: /var/run/secrets/kubeflow/pipelines/token
AWS_ACCESS_KEY_ID: <set to the key 'AWS_ACCESS_KEY_ID' in secret 'mlflow-server-minio-artifact'> Optional: false
AWS_SECRET_ACCESS_KEY: <set to the key 'AWS_SECRET_ACCESS_KEY' in secret 'mlflow-server-minio-artifact'> Optional: false
MINIO_ENDPOINT_URL: http://mlflow-minio.kubeflow:9000
MLFLOW_S3_ENDPOINT_URL: http://mlflow-minio.kubeflow:9000
MLFLOW_TRACKING_URI: http://mlflow-server.kubeflow.svc.cluster.local:5000
Mounts:
/dev/shm from dshm (rw)
/home/jovyan from test-notebook-workspace (rw)
/var/run/secrets/kubeflow/pipelines from volume-kf-pipeline-token (ro)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pjqff (ro)
istio-proxy:
Container ID: containerd://32a6fb9ea59127313112d4b28a311b305bfbf30c3196efbfa9e3a5f77e35ef9a
Image: docker.io/istio/proxyv2:1.24.2
Image ID: docker.io/istio/proxyv2@sha256:445156b5f4a780242d079a47b7d88199cbbb5959c92358469b721af490eca1ae
Port: 15090/TCP
Host Port: 0/TCP
Args:
proxy
sidecar
--domain
$(POD_NAMESPACE).svc.cluster.local
--proxyLogLevel=warning
--proxyComponentLogLevel=misc:error
--log_output_level=default:info
State: Running
Started: Fri, 25 Apr 2025 08:05:24 +0000
Ready: True
Restart Count: 0
Limits:
cpu: 2
memory: 1Gi
Requests:
cpu: 100m
memory: 128Mi
Readiness: http-get http://:15021/healthz/ready delay=0s timeout=3s period=15s #success=1 #failure=4
Startup: http-get http://:15021/healthz/ready delay=0s timeout=3s period=1s #success=1 #failure=600
Environment:
PILOT_CERT_PROVIDER: istiod
CA_ADDR: istiod.kubeflow.svc:15012
POD_NAME: test-notebook-0 (v1:metadata.name)
POD_NAMESPACE: admin (v1:metadata.namespace)
INSTANCE_IP: (v1:status.podIP)
SERVICE_ACCOUNT: (v1:spec.serviceAccountName)
HOST_IP: (v1:status.hostIP)
ISTIO_CPU_LIMIT: 2 (limits.cpu)
PROXY_CONFIG: {"discoveryAddress":"istiod.kubeflow.svc:15012"}
ISTIO_META_POD_PORTS: [
{"name":"notebook-port","containerPort":8888,"protocol":"TCP"}
]
ISTIO_META_APP_CONTAINERS: test-notebook
GOMEMLIMIT: 1073741824 (limits.memory)
GOMAXPROCS: 2 (limits.cpu)
ISTIO_META_CLUSTER_ID: Kubernetes
ISTIO_META_NODE_NAME: (v1:spec.nodeName)
ISTIO_META_INTERCEPTION_MODE: REDIRECT
ISTIO_META_WORKLOAD_NAME: test-notebook
ISTIO_META_OWNER: kubernetes://apis/apps/v1/namespaces/admin/statefulsets/test-notebook
ISTIO_META_MESH_ID: cluster.local
TRUST_DOMAIN: cluster.local
Mounts:
/etc/istio/pod from istio-podinfo (rw)
/etc/istio/proxy from istio-envoy (rw)
/var/lib/istio/data from istio-data (rw)
/var/run/secrets/credential-uds from credential-socket (rw)
/var/run/secrets/istio from istiod-ca-cert (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pjqff (ro)
/var/run/secrets/tokens from istio-token (rw)
/var/run/secrets/workload-spiffe-credentials from workload-certs (rw)
/var/run/secrets/workload-spiffe-uds from workload-socket (rw)
Conditions:
Type Status
PodReadyToStartContainers True
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
workload-socket:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
credential-socket:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
workload-certs:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-envoy:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
istio-data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
istio-podinfo:
Type: DownwardAPI (a volume populated by information about the pod)
Items:
metadata.labels -> labels
metadata.annotations -> annotations
istio-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 43200
istiod-ca-cert:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: istio-ca-root-cert
Optional: false
dshm:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium: Memory
SizeLimit: <unset>
test-notebook-workspace:
Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
ClaimName: test-notebook-workspace
ReadOnly: false
kube-api-access-pjqff:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
volume-kf-pipeline-token:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 7200
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal BackOff 112s (x66 over 16m) kubelet Back-off pulling image "kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1"
Warning Failed 100s (x67 over 16m) kubelet Error: ImagePullBackOff
Metadata
Metadata
Assignees
Type
Projects
Status