Skip to content

Host is stuck in Connecting state after update to 4.18.2.0 #9250

Open
@0sorkon

Description

@0sorkon
ISSUE TYPE
  • Bug Report
COMPONENT NAME
management server
CLOUDSTACK VERSION
4.18.2

Hello, I need help to resolve this.
After updating 4.18.1 -> 4.18.2, the first updated host is stuck in the "Connecting" state. The MS update was successful, but the problem occurred with the first host I tried to update immediately after MS. In the MS logs:

2024-06-13 16:12:30,872 DEBUG [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-12:null) (logid:) SeqA 60-328142: Sending Seq 60-328142: { Ans: , MgmtId: 20788631421694, via: 60, Ver: v1, Flags: 100010, [{"com.cloud.agent.api.AgentControlAnswer":{"result":"true","wait":"0","bypassHostMaintenance":"false"}}] }
2024-06-13 16:12:31,437 INFO [c.c.u.e.CSExceptionErrorCode] (AgentConnectTaskPool-149:ctx-31302c74) (logid:90640aee) Could not find exception: com.cloud.exception.ConnectionException in error code list for exceptions
2024-06-13 16:12:31,437 DEBUG [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-149:ctx-31302c74) (logid:90640aee) Failed to handle host connection:
com.cloud.exception.ConnectionException: Unable to acquire lock on host 53f6efbf-5180-4475-8f24-b960a3e08db1
at com.cloud.agent.manager.AgentManagerImpl.sendReadyAndGetAttache(AgentManagerImpl.java:1153)
at com.cloud.agent.manager.AgentManagerImpl.handleConnectedAgent(AgentManagerImpl.java:1166)
at com.cloud.agent.manager.AgentManagerImpl$HandleAgentConnectTask.runInContext(AgentManagerImpl.java:1250)
at org.apache.cloudstack.managed.context.ManagedContextRunnable$1.run(ManagedContextRunnable.java:48)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext$1.call(DefaultManagedContext.java:55)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.callWithContext(DefaultManagedContext.java:102)
at org.apache.cloudstack.managed.context.impl.DefaultManagedContext.runWithContext(DefaultManagedContext.java:52)
at org.apache.cloudstack.managed.context.ManagedContextRunnable.run(ManagedContextRunnable.java:45)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2024-06-13 16:12:31,438 WARN [c.c.a.m.AgentManagerImpl] (AgentConnectTaskPool-149:ctx-31302c74) (logid:90640aee) Unable to create attache for agent: Seq 0-0: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 1, [{"com.cloud.agent.api.StartupRoutingCommand":{"cpuSockets":"1","cpus":"48","speed":"2794","memory":"268619120640","dom0MinMemory":"1073741824","poolSync":"false","supportsClonedVolumes":"false","caps":"hvm,snapshot","pool":"/root","hypervisorType":"KVM","hostDetails":{"Host.OS.Kernel.Version":"5.15.0-105-generic","com.cloud.network.Networks.RouterPrivateIpStrategy":"HostLocal","Host.OS.Version":"22.04","host.volume.encryption":"true","secured":"true","Host.OS":"Ubuntu"},"hostTags":[],"groupDetails":{},"type":"Routing","dataCenter":"1","pod":"1","cluster":"1","guid":"7f57af20-c5cf-3ec7-95bc-7bf6463cfa84-LibvirtComputingResource","name":"cloud.nl.introserv.net","id":"0","version":"4.18.2.0","iqn":"iqn.2004-10.com.ubuntu:01:e7f6625b59c","publicIpAddress":"185.191.1.138","publicNetmask":"255.255.255.248","publicMacAddress":"e2:c5:c6:ce:3b:69","privateIpAddress":"172.16.0.1","privateMacAddress":"b2:38:22:80:59:7d","privateNetmask":"255.255.255.0","storageIpAddress":"172.16.0.1","storageNetmask":"255.255.255.0","storageMacAddress":"b2:38:22:80:59:7d","resourceName":"LibvirtComputingResource","gatewayIpAddress":"185.191.1.137","msHostList":"178.132.3.175@static","wait":"0","bypassHostMaintenance":"false"}},{"com.cloud.agent.api.StartupStorageCommand":{"totalSize":"(0 bytes) 0","poolInfo":{"uuid":"8a76c6b2-f791-4f6b-a09e-85a581b8189f","host":"172.16.0.1","localPath":"/Storage/images","hostPath":"/Storage/images","poolType":"Filesystem","capacityBytes":"(97.87 GB) 105089261568","availableBytes":"(74.69 GB) 80196403200"},"resourceType":"STORAGE_POOL","hostDetails":{},"type":"Storage","dataCenter":"1","pod":"1","guid":"7f57af20-c5cf-3ec7-95bc-7bf6463cfa84-LibvirtComputingResource","name":"cloud.nl.introserv.net","id":"0","version":"4.18.2.0","resourceName":"LibvirtComputingResource","msHostList":"178.132.3.175@static","wait":"0","bypassHostMaintenance":"false"}}] }
2024-06-13 16:12:31,446 WARN [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-5:null) (logid:) Throwing away a request because it came through as the first command on a connect: Seq 0--1: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 111, [{"com.cloud.agent.api.ShutdownCommand":{"reason":"sig.kill","removeHost":"false","wait":"0","bypassHostMaintenance":"false"}}] }
2024-06-13 16:12:31,539 DEBUG [c.c.s.StatsCollector] (StatsCollector-5:ctx-7eb2d626) (logid:ac80bd51) ManagementServerCollector is running...
2024-06-13 16:12:31,540 WARN [c.c.a.m.AgentManagerImpl] (AgentManager-Handler-13:null) (logid:) Throwing away a request because it came through as the first command on a connect: Seq 0-3: { Cmd , MgmtId: -1, via: 0, Ver: v1, Flags: 11, [{"com.cloud.agent.api.PingRoutingWithNwGroupsCommand":{"newGroupStates":{},"_hostVmStateReport":{},"_gatewayAccessible":"true","_vnetAccessible":"true","hostType":"Routing","hostId":"0","wait":"0","bypassHostMaintenance":"false"}}] }

In the agent log:

2024-06-13 16:17:09,751 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0
2024-06-13 16:19:09,746 INFO [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill
2024-06-13 16:19:21,239 INFO [cloud.agent.AgentShell] (main:null) (logid:) Agent started
2024-06-13 16:19:21,243 INFO [cloud.agent.AgentShell] (main:null) (logid:) Implementation Version is 4.18.2.0
2024-06-13 16:19:21,244 INFO [cloud.agent.AgentShell] (main:null) (logid:) agent.properties found at /etc/cloudstack/agent/agent.properties
2024-06-13 16:19:21,283 INFO [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to using properties file for storage
2024-06-13 16:19:21,284 INFO [cloud.agent.AgentShell] (main:null) (logid:) Defaulting to the constant time backoff algorithm
2024-06-13 16:19:21,321 INFO [cloud.utils.LogUtils] (main:null) (logid:) log4j configuration found at /etc/cloudstack/agent/log4j-cloud.xml
2024-06-13 16:19:21,321 INFO [cloud.agent.AgentShell] (main:null) (logid:) Using default Java settings for IPv6 preference for agent connection
2024-06-13 16:19:21,387 INFO [cloud.agent.Agent] (main:null) (logid:) id is 0
2024-06-13 16:19:21,396 ERROR [kvm.resource.LibvirtComputingResource] (main:null) (logid:) uefi properties file not found due to: Unable to find file uefi.properties.
2024-06-13 16:19:21,435 INFO [kvm.resource.LibvirtConnection] (main:null) (logid:) No existing libvirtd connection found. Opening a new one
2024-06-13 16:19:21,652 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) IO uring driver for Qemu: enabled
2024-06-13 16:19:21,757 INFO [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) adding storage adaptor for com.cloud.hypervisor.kvm.storage.LinstorStorageAdaptor
2024-06-13 16:19:21,760 INFO [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) adding storage adaptor for com.cloud.hypervisor.kvm.storage.StorPoolStorageAdaptor
2024-06-13 16:19:21,761 WARN [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) Duplicate StorageAdaptor type PowerFlex, not loading com.cloud.hypervisor.kvm.storage.ScaleIOStorageAdaptor
2024-06-13 16:19:21,761 INFO [kvm.storage.KVMStoragePoolManager] (main:null) (logid:) adding storage adaptor for com.cloud.hypervisor.kvm.storage.IscsiAdmStorageAdaptor
2024-06-13 16:19:21,762 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) No libvirt.vif.driver specified. Defaults to BridgeVifDriver.
2024-06-13 16:19:21,888 INFO [cloud.serializer.GsonHelper] (main:null) (logid:) Default Builder inited.
2024-06-13 16:19:21,888 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) iscsi session clean up is disabled
2024-06-13 16:19:21,890 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) Skipping the memory balloon stats period setting, since there are no VMs (active Libvirt domains) on this host.
2024-06-13 16:19:21,891 INFO [kvm.resource.LibvirtComputingResource] (main:null) (logid:) The [vm.memballoon.stats.period] property is set to '0', this prevents memory statistics from being displayed correctly. Adjust (increase) the value of this parameter to correct this.
2024-06-13 16:19:21,899 INFO [cloud.agent.Agent] (main:null) (logid:) Agent [id = 0 : type = LibvirtComputingResource : zone = 1 : pod = 1 : workers = 5 : host = 178.132.3.175 : port = 8250
2024-06-13 16:19:21,905 INFO [utils.nio.NioClient] (main:null) (logid:) Connecting to 178.132.3.175:8250
2024-06-13 16:19:21,907 INFO [utils.nio.Link] (main:null) (logid:) Conf file found: /etc/cloudstack/agent/agent.properties
2024-06-13 16:19:22,163 INFO [utils.nio.NioClient] (main:null) (logid:) SSL: Handshake done
2024-06-13 16:19:22,163 INFO [utils.nio.NioClient] (main:null) (logid:) Connected to 178.132.3.175:8250
2024-06-13 16:19:22,236 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Fetching CPU speed from command "lscpu".
2024-06-13 16:19:22,279 ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Unable to retrieve the CPU speed from lscpu.
java.lang.NullPointerException
at java.base/jdk.internal.math.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1838)
at java.base/jdk.internal.math.FloatingDecimal.parseFloat(FloatingDecimal.java:122)
at java.base/java.lang.Float.parseFloat(Float.java:455)
at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromCommandLscpu(KVMHostInfo.java:124)
at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:99)
at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:179)
at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:65)
at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3522)
at com.cloud.agent.Agent.sendStartup(Agent.java:444)
at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1105)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2024-06-13 16:19:22,281 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Fetching CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency].
2024-06-13 16:19:22,281 ERROR [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Unable to retrieve the CPU speed from file [/sys/devices/system/cpu/cpu0/cpufreq/base_frequency]
java.io.FileNotFoundException: /sys/devices/system/cpu/cpu0/cpufreq/base_frequency (No such file or directory)
at java.base/java.io.FileInputStream.open0(Native Method)
at java.base/java.io.FileInputStream.open(FileInputStream.java:219)
at java.base/java.io.FileInputStream.(FileInputStream.java:157)
at java.base/java.io.FileInputStream.(FileInputStream.java:112)
at java.base/java.io.FileReader.(FileReader.java:60)
at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeedFromFile(KVMHostInfo.java:135)
at org.apache.cloudstack.utils.linux.KVMHostInfo.getCpuSpeed(KVMHostInfo.java:104)
at org.apache.cloudstack.utils.linux.KVMHostInfo.getHostInfoFromLibvirt(KVMHostInfo.java:179)
at org.apache.cloudstack.utils.linux.KVMHostInfo.(KVMHostInfo.java:65)
at com.cloud.hypervisor.kvm.resource.LibvirtComputingResource.initialize(LibvirtComputingResource.java:3522)
at com.cloud.agent.Agent.sendStartup(Agent.java:444)
at com.cloud.agent.Agent$ServerHandler.doTask(Agent.java:1105)
at com.cloud.utils.nio.Task.call(Task.java:83)
at com.cloud.utils.nio.Task.call(Task.java:29)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
2024-06-13 16:19:22,281 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Fetching CPU speed from "host capabilities"
2024-06-13 16:19:22,293 INFO [utils.linux.KVMHostInfo] (Agent-Handler-1:null) (logid:) Retrieved value [2794749000] from "host capabilities". This corresponds to a CPU speed of [2794] MHz.
2024-06-13 16:19:22,302 INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Host uses control group [cgroup2fs].
2024-06-13 16:19:22,302 INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) Calculating the max shares of the host.
2024-06-13 16:19:22,302 INFO [kvm.resource.LibvirtComputingResource] (Agent-Handler-1:null) (logid:) The max shares of the host is [134112].
2024-06-13 16:19:22,344 ERROR [kvm.resource.LibvirtConnection] (Agent-Handler-1:null) (logid:) Connection with libvirtd is broken: invalid connection pointer in virConnectGetVersion
2024-06-13 16:19:22,360 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Attempting to create storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f (Filesystem) in libvirt
2024-06-13 16:19:22,382 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Found existing defined storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f, using it.
2024-06-13 16:19:22,383 INFO [kvm.storage.LibvirtStorageAdaptor] (Agent-Handler-1:null) (logid:) Trying to fetch storage pool 8a76c6b2-f791-4f6b-a09e-85a581b8189f from libvirt
2024-06-13 16:19:22,473 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Process agent startup answer, agent id = 0
2024-06-13 16:19:22,473 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Set agent id 0
2024-06-13 16:19:22,486 INFO [cloud.agent.Agent] (Agent-Handler-2:null) (logid:) Startup Response Received: agent id = 0
2024-06-13 16:21:22,481 INFO [cloud.agent.Agent] (AgentShutdownThread:null) (logid:) Stopping the agent: Reason = sig.kill

Metadata

Metadata

Assignees

Type

No type

Projects

Status

Todo

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions