On this 4.21 run- https://url.corp.redhat.com/9200662
This test failed tests/functional/disaster-recovery/regional-dr/test_failover.py::TestFailover::test_failover[primary_up-rbd-cli] because mirroring status did not recover after failover, which is because force promotion was not successful for the whole group, which in turn leads to maintenance mode not being deleted thus keeping rbd-mirror daemon scaled down on the failover cluster [ashakya-c]
Adding this check in the test setup fixture will ensure other tests are skipped when the rbd-mirror daemon is not running on both the managed clusters as part of peer DR relationship, and will also give clarity if it was up and running during the test setup but isn't running anymore during the test execution.
Ceph health on cluster [ashakya-c]* when the daemon is missing
ceph
cluster:
id: 1fd0ed80-f781-496f-8316-abf891d2eb65
health: HEALTH_OK
services:
mon: 3 daemons, quorum d,e,f (age 27h) [leader: d]
mgr: b(active, since 18h), standbys: a
mds: 1/1 daemons up, 1 hot standby
osd: 3 osds: 3 up (since 27h), 3 in (since 44h)
rgw: 1 daemon active (1 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 12 pools, 185 pgs
objects: 40.19k objects, 26 GiB
usage: 81 GiB used, 2.5 TiB / 2.6 TiB avail
pgs: 185 active+clean
io:
client: 7.7 KiB/s rd, 477 KiB/s wr, 8 op/s rd, 4 op/s wr
oc get deploy -n openshift-storage|grep mirror
On this 4.21 run- https://url.corp.redhat.com/9200662
This test failed tests/functional/disaster-recovery/regional-dr/test_failover.py::TestFailover::test_failover[primary_up-rbd-cli] because mirroring status did not recover after failover, which is because force promotion was not successful for the whole group, which in turn leads to maintenance mode not being deleted thus keeping rbd-mirror daemon scaled down on the failover cluster [ashakya-c]
Adding this check in the test setup fixture will ensure other tests are skipped when the rbd-mirror daemon is not running on both the managed clusters as part of peer DR relationship, and will also give clarity if it was up and running during the test setup but isn't running anymore during the test execution.
Ceph health on cluster [ashakya-c]* when the daemon is missing