Skip to content

[Bug]: test_storagecluster_not_ready_blocks_upgrade need a fix for inconsistent failure ResourceLeftoversException at teardown #14887

@suchita-g

Description

@suchita-g

Description

test_storagecluster_not_ready_blocks_upgrade need a fix for inconsistent failure ResourceLeftoversException at teardown

Root Cause
The test's fixture patches resourceProfile from None→balanced, then restores it to None. During the restore transition, rook-ceph operator recreates mon-d pod (old: 8xgjr created 18:18:39Z
→ new: pnvg2 created 18:30:40Z). The environment_check teardown fixture detects the new mon-d pod name as an added resource and raises ResourceLeftoversException. The test body itself
passed — StorageCluster correctly went Progressing→Error→Ready as expected.

Steps to Reproduce

  1. RP Link: https://reportportal-ocs4.apps.dno.ocp-hub.prod.psi.redhat.com/ui/#ocs/launches/994/47188/2106945/2107309/log

Actual Behavior

failure ResourceLeftoversException at teardown

Expected Behavior

Test should pass

Impact (likelihood of reproduction, impact on the cluster and on other tests, etc.)

Screenshots (if applicable)

Environment

  • Test Suite(s): test_storagecluster_not_ready_blocks_upgrade
  • Platform(s): vSphere- OS
  • Version(s): ODF 4.21.2-2 / OCP 4.21.0-0.nightly-2026-04-02-002715

Additional Context

Evidence

  1. debug.log:23906 — PRE mon-d pod rook-ceph-mon-d-dbdd4677b-8xgjr, creationTimestamp: 2026-04-04T18:18:39Z
  2. debug.log:90672 — POST mon-d pod rook-ceph-mon-d-dbdd4677b-pnvg2, creationTimestamp: 2026-04-04T18:30:40Z (recreated during profile restore)
  3. debug.log:32576 — resourceProfile patched from None to "balanced" at 14:22:54
  4. debug.log:56890 — resourceProfile restored from "balanced" to None at 14:28:08
  5. debug.log:66756 — StorageCluster reached Ready at 14:33:27 after ~5min Error phase

Cluster State at Failure Time

  • Pods: All Running. mon-d pod was recreated (expected during resourceProfile change). OSD temporarily down during Error phase but recovered.
  • Ceph: HEALTH_OK at test start (14:22:53). Transient HEALTH_WARN during Error phase (1 OSD down, 33% PGs degraded). Recovered to HEALTH_OK by end.
  • Events: CephOSDDown/CephClusterWarningState alerts fired during 5min Error phase — expected side effect of resourceProfile transition.

Metadata

Metadata

Assignees

Labels

Medium PriorityFix this after High Priority tickets are fixed.Squad/BrownbugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions