Skip to content

HA : Hosts persist in the Suspect state in HA cluster with ShareMountPoint #10166

Open
@Luskan777

Description

@Luskan777
ISSUE TYPE
  • Bug Report
COMPONENT NAME
HA, KVM
CLOUDSTACK VERSION
4.20
CONFIGURATION

Zone type : Advanced Network
Primary Storage: ShareMountPoint

OS / ENVIRONMENT

Hosts OS: Ubuntu 22.04 (HPE ProLiant BL460c Gen10)
Management Server OS: Ubuntu 22.04
out-of-band management driver: IPMI

SUMMARY

Hello, I configured out-of-band management on my hosts, however, the HA status of my hosts is always between Suspect or DEGRADED, I have already checked the IPMI communication and everything is working, my servers are also on and operational.

image

STEPS TO REPRODUCE
Configure Hosts KVM
Configure HA provider with KVMHAProvider
Configure out-of-band management with IPMI driver
Enable HA and see HA State
EXPECTED RESULTS
HA hosts with AVAILABLE state
ACTUAL RESULTS

Managemente Server logs:

@MSLOG@:2025-01-07 00:29:25,698 DEBUG [o.a.c.h.HAManagerImpl] (pool-4-thread-21:[]) HA state post-transition:: new state=[Suspect], old state=[Checking], for resource id=[3], status=[true], ha config state=[Suspect].
@MSLOG@:2025-01-07 00:29:25,707 DEBUG [o.a.c.h.HAManagerImpl] (pool-4-thread-21:[]) Transitioned host HA state from:Checking to:Suspect due to event:TooFewActivityCheckSamples for the host id:3
@MSLOG@:2025-01-07 00:29:41,622 DEBUG [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-2:[ctx-28440d8d]) HA state post-transition:: new state=[Checking], old state=[Suspect], for resource id=[2], status=[true], ha config state=[Checking].
@MSLOG@:2025-01-07 00:29:41,629 DEBUG [o.a.c.h.HAManagerImpl] (BackgroundTaskPollManager-2:[ctx-28440d8d]) Transitioned host HA state from:Suspect to:Checking due to event:PerformActivityCheck for the host id:2

2025-01-07 15:44:06,928 DEBUG [o.a.c.u.p.ProcessRunner] (pool-2-thread-11:[]) Process standard output for command [/usr/bin/ipmitool -I lanplus -R 1 -v -H 10.16.20.21 -p 623 -U cloudstack -P ***** chassis power status]: [Chassis Power is on
].
2025-01-07 15:44:06,928 DEBUG [o.a.c.u.p.ProcessRunner] (pool-2-thread-11:[]) Process standard error output command [/usr/bin/ipmitool -I lanplus -R 1 -v -H 10.16.20.21 -p 623 -U cloudstack -P ***** chassis power status]: [Running Get PICMG Properties my_addr 0x20, transit 0, target 0x20
Error response 0xc1 from Get PICMG Properities
Running Get VSO Capabilities my_addr 0x20, transit 0, target 0x20
Invalid completion code received: Invalid command
Discovered IPMB address 0x0
].
2025-01-07 15:44:06,929 DEBUG [o.a.c.o.d.i.IpmitoolOutOfBandManagementDriver] (pool-2-thread-11:[]) The command [/usr/bin/ipmitool -I lanplus -R 1 -v -H 10.16.20.21 -p 623 -U cloudstack -P PASSWORD  chassis power status] was successful and got the result [Chassis Power is on].

KVM hosts logs:

2025-01-07 15:49:52,534 DEBUG [kvm.resource.KVMHAChecker] (pool-1067-thread-1:[]) (logid:) Checking heart beat with KVMHAChecker for host IP [IP_SERVER] in pools []
2025-01-07 15:49:52,534 WARN  [kvm.resource.KVMHAChecker] (pool-1067-thread-1:[]) (logid:) All checks with KVMHAChecker for host IP [IP_SERVER] in pools [] considered it as dead. It may cause a shutdown of the host.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    Status

    Todo

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions