Skip to content

Error pulling image kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1, failed size validation #265

Open
@deneriz-veridas

Description

@deneriz-veridas

Edit: This happened due to Docker Hub rate limits (see #265 (comment))

Checks

Kubeflow Notebooks Version

1.10

Kubeflow Platform

Charmed Kubeflow 1.10

Kubernetes Distribution

MicroK8s v1.32.3 revision 8148

Kubernetes Version

Client Version: v1.32.3
Kustomize Version: v5.5.0
Server Version: v1.32.3

Description

I was following the official guide (link) to integrate Charmed Kubeflow and Charmed MLflow.

When I created a notebook using the image kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1, the pod failed to start and encountered the following error:

ImagePullBackOff: Back-off pulling image "kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1": ErrImagePull: rpc error: code = FailedPrecondition desc = failed to pull and unpack image "docker.io/kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1": failed commit on ref "manifest-sha256:7851e1a90c95abd2a727cf2b71eaaf756c82291a58aa069c7e2c8e0fd83b5d6a": "manifest-sha256:7851e1a90c95abd2a727cf2b71eaaf756c82291a58aa069c7e2c8e0fd83b5d6a" failed size validation: 442244 != 4860: failed precondition

The error message indicates a failure during the image pull and unpack process, specifically "failed size validation: 442244 != 4860". It appears the pod is unable to validate the image correctly after pulling it.

I suspect this might be an issue related to Kubeflow Notebooks, which is why I'm posting it here. However, if this issue should be reported elsewhere, please let me know the appropriate place.

Relevant Logs

microk8s kubectl describe pods test-notebook-0 -n admin
Name:             test-notebook-0
Namespace:        admin
Priority:         0
Service Account:  default-editor
Node:             calculon15/192.168.50.33
Start Time:       Fri, 25 Apr 2025 08:05:18 +0000
Labels:           access-minio=true
                  access-ml-pipeline=true
                  app=test-notebook
                  apps.kubernetes.io/pod-index=0
                  controller-revision-hash=test-notebook-fd97744f6
                  mlflow-server-minio=true
                  notebook-name=test-notebook
                  security.istio.io/tlsMode=istio
                  service.istio.io/canonical-name=test-notebook
                  service.istio.io/canonical-revision=latest
                  statefulset=test-notebook
                  statefulset.kubernetes.io/pod-name=test-notebook-0
Annotations:      cni.projectcalico.org/containerID: f9d40694ca8f3585b55baf24ec33361b06be4d544995ccaae6a30bb281e2f2cc
                  cni.projectcalico.org/podIP: 10.1.45.203/32
                  cni.projectcalico.org/podIPs: 10.1.45.203/32
                  istio.io/rev: default
                  kubectl.kubernetes.io/default-container: test-notebook
                  kubectl.kubernetes.io/default-logs-container: test-notebook
                  poddefault.admission.kubeflow.org/poddefault-access-ml-pipeline: 6220594
                  poddefault.admission.kubeflow.org/poddefault-mlflow-server-access-minio: 6378115
                  poddefault.admission.kubeflow.org/poddefault-mlflow-server-minio: 6378116
                  prometheus.io/path: /stats/prometheus
                  prometheus.io/port: 15020
                  prometheus.io/scrape: true
                  sidecar.istio.io/status:
                    {"initContainers":["istio-init"],"containers":["istio-proxy"],"volumes":["workload-socket","credential-socket","workload-certs","istio-env...
Status:           Pending
IP:               10.1.45.203
IPs:
  IP:           10.1.45.203
Controlled By:  StatefulSet/test-notebook
Init Containers:
  istio-init:
    Container ID:  containerd://34eabed85bd3e06ce2dd36546834d78dbc3962e7634c57f56f8994439b660339
    Image:         docker.io/istio/proxyv2:1.24.2
    Image ID:      docker.io/istio/proxyv2@sha256:445156b5f4a780242d079a47b7d88199cbbb5959c92358469b721af490eca1ae
    Port:          <none>
    Host Port:     <none>
    Args:
      istio-iptables
      -p
      15001
      -z
      15006
      -u
      1337
      -m
      REDIRECT
      -i
      *
      -x
      
      -b
      *
      -d
      15090,15021,15020
      --log_output_level=default:info
    State:          Terminated
      Reason:       Completed
      Exit Code:    0
      Started:      Fri, 25 Apr 2025 08:05:20 +0000
      Finished:     Fri, 25 Apr 2025 08:05:20 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:        100m
      memory:     128Mi
    Environment:  <none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pjqff (ro)
Containers:
  test-notebook:
    Container ID:   
    Image:          kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1
    Image ID:       
    Port:           8888/TCP
    Host Port:      0/TCP
    State:          Waiting
      Reason:       ImagePullBackOff
    Ready:          False
    Restart Count:  0
    Limits:
      cpu:     600m
      memory:  1288490188800m
    Requests:
      cpu:     500m
      memory:  1Gi
    Environment:
      NB_PREFIX:                   /notebook/admin/test-notebook
      KF_PIPELINES_SA_TOKEN_PATH:  /var/run/secrets/kubeflow/pipelines/token
      AWS_ACCESS_KEY_ID:           <set to the key 'AWS_ACCESS_KEY_ID' in secret 'mlflow-server-minio-artifact'>      Optional: false
      AWS_SECRET_ACCESS_KEY:       <set to the key 'AWS_SECRET_ACCESS_KEY' in secret 'mlflow-server-minio-artifact'>  Optional: false
      MINIO_ENDPOINT_URL:          http://mlflow-minio.kubeflow:9000
      MLFLOW_S3_ENDPOINT_URL:      http://mlflow-minio.kubeflow:9000
      MLFLOW_TRACKING_URI:         http://mlflow-server.kubeflow.svc.cluster.local:5000
    Mounts:
      /dev/shm from dshm (rw)
      /home/jovyan from test-notebook-workspace (rw)
      /var/run/secrets/kubeflow/pipelines from volume-kf-pipeline-token (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pjqff (ro)
  istio-proxy:
    Container ID:  containerd://32a6fb9ea59127313112d4b28a311b305bfbf30c3196efbfa9e3a5f77e35ef9a
    Image:         docker.io/istio/proxyv2:1.24.2
    Image ID:      docker.io/istio/proxyv2@sha256:445156b5f4a780242d079a47b7d88199cbbb5959c92358469b721af490eca1ae
    Port:          15090/TCP
    Host Port:     0/TCP
    Args:
      proxy
      sidecar
      --domain
      $(POD_NAMESPACE).svc.cluster.local
      --proxyLogLevel=warning
      --proxyComponentLogLevel=misc:error
      --log_output_level=default:info
    State:          Running
      Started:      Fri, 25 Apr 2025 08:05:24 +0000
    Ready:          True
    Restart Count:  0
    Limits:
      cpu:     2
      memory:  1Gi
    Requests:
      cpu:      100m
      memory:   128Mi
    Readiness:  http-get http://:15021/healthz/ready delay=0s timeout=3s period=15s #success=1 #failure=4
    Startup:    http-get http://:15021/healthz/ready delay=0s timeout=3s period=1s #success=1 #failure=600
    Environment:
      PILOT_CERT_PROVIDER:           istiod
      CA_ADDR:                       istiod.kubeflow.svc:15012
      POD_NAME:                      test-notebook-0 (v1:metadata.name)
      POD_NAMESPACE:                 admin (v1:metadata.namespace)
      INSTANCE_IP:                    (v1:status.podIP)
      SERVICE_ACCOUNT:                (v1:spec.serviceAccountName)
      HOST_IP:                        (v1:status.hostIP)
      ISTIO_CPU_LIMIT:               2 (limits.cpu)
      PROXY_CONFIG:                  {"discoveryAddress":"istiod.kubeflow.svc:15012"}
                                     
      ISTIO_META_POD_PORTS:          [
                                         {"name":"notebook-port","containerPort":8888,"protocol":"TCP"}
                                     ]
      ISTIO_META_APP_CONTAINERS:     test-notebook
      GOMEMLIMIT:                    1073741824 (limits.memory)
      GOMAXPROCS:                    2 (limits.cpu)
      ISTIO_META_CLUSTER_ID:         Kubernetes
      ISTIO_META_NODE_NAME:           (v1:spec.nodeName)
      ISTIO_META_INTERCEPTION_MODE:  REDIRECT
      ISTIO_META_WORKLOAD_NAME:      test-notebook
      ISTIO_META_OWNER:              kubernetes://apis/apps/v1/namespaces/admin/statefulsets/test-notebook
      ISTIO_META_MESH_ID:            cluster.local
      TRUST_DOMAIN:                  cluster.local
    Mounts:
      /etc/istio/pod from istio-podinfo (rw)
      /etc/istio/proxy from istio-envoy (rw)
      /var/lib/istio/data from istio-data (rw)
      /var/run/secrets/credential-uds from credential-socket (rw)
      /var/run/secrets/istio from istiod-ca-cert (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-pjqff (ro)
      /var/run/secrets/tokens from istio-token (rw)
      /var/run/secrets/workload-spiffe-credentials from workload-certs (rw)
      /var/run/secrets/workload-spiffe-uds from workload-socket (rw)
Conditions:
  Type                        Status
  PodReadyToStartContainers   True 
  Initialized                 True 
  Ready                       False 
  ContainersReady             False 
  PodScheduled                True 
Volumes:
  workload-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  credential-socket:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  workload-certs:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-envoy:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  istio-data:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     
    SizeLimit:  <unset>
  istio-podinfo:
    Type:  DownwardAPI (a volume populated by information about the pod)
    Items:
      metadata.labels -> labels
      metadata.annotations -> annotations
  istio-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  43200
  istiod-ca-cert:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      istio-ca-root-cert
    Optional:  false
  dshm:
    Type:       EmptyDir (a temporary directory that shares a pod's lifetime)
    Medium:     Memory
    SizeLimit:  <unset>
  test-notebook-workspace:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  test-notebook-workspace
    ReadOnly:   false
  kube-api-access-pjqff:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
  volume-kf-pipeline-token:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  7200
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason   Age                  From     Message
  ----     ------   ----                 ----     -------
  Normal   BackOff  112s (x66 over 16m)  kubelet  Back-off pulling image "kubeflownotebookswg/jupyter-tensorflow-full:v1.10.0-rc.1"
  Warning  Failed   100s (x67 over 16m)  kubelet  Error: ImagePullBackOff

Metadata

Metadata

Assignees

No one assigned

    Labels

    kind/bugkind - things not working properlypriority/needs-triagepriority - needs to be triaged

    Type

    No type

    Projects

    Status

    Needs Triage

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions