LSO `diskmaker-manager` stuck in `ContainerCreating` on one worker (image pull); only 2/3 `localblock` PVs until    pod recreated

 On a 3-node worker cluster, the LocalVolumeSet `localblock` created only two PersistentVolumes even though
  discovery found an eligible disk on each worker.


  What we observed

  • `LocalVolumeSet` status: `DiskMaker: 1/3 Unavailable`, `totalProvisionedDeviceCount: 2`
  • `diskmaker-manager` DaemonSet: 2/3 pods Ready
  • On one worker (e.g., compute-1), `diskmaker-manager` stayed `0/2`, `ContainerCreating`, for a long time
  • `oc describe pod` showed a `Pulling` event for

    registry.redhat.io/openshift4/ose-local-storage-diskmaker-rhel9@sha256:…
    and the image never transitioned to Pulled / containers never started
  • `LocalVolumeDiscovery` on all three nodes still showed `/dev/sdb` (or equivalent) as `Available` — so this was
    not “no disk on the node”



  Impact

  • Only two localblock PVs exist until DiskMaker runs on every node that should contribute disks
  • Anything expecting one PV per worker (e.g., ODF / Ceph) can be short on storage



  What fixed it (workaround)

  1. Force-delete the stuck pod so the DaemonSet recreates it:


     oc delete pod <diskmaker-manager-pod-on-affected-node> -n openshift-local-storage --force --grace-period=0

  2. After the new pod was 2/2 Running, LSO logged “found possible matching disk, waiting 1m0s to claim” on that
     node; after ~1 minute the third PV appeared.



  Suggested follow-up / investigation

  • Why CRI-O / kubelet on the affected node got stuck pulling the LSO images (network, registry auth, node disk,
    CRI-O bug, etc.)
  • Whether timeouts or retries for long pulls need tuning, or if this should be documented as a known recovery
    step



  Environment (fill in)

  vSphere LSO 4.22

See the related vSphere LSO deployment: https://jenkins-csb-odf-qe-ocs4.dno.corp.redhat.com/job/qe-deploy-ocs-cluster/66778/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LSO `diskmaker-manager` stuck in `ContainerCreating` on one worker (image pull); only 2/3 `localblock` PVs until pod recreated #14874

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LSO diskmaker-manager stuck in ContainerCreating on one worker (image pull); only 2/3 localblock PVs until pod recreated #14874

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

LSO `diskmaker-manager` stuck in `ContainerCreating` on one worker (image pull); only 2/3 `localblock` PVs until pod recreated #14874