Open
Description
Description
Hi,
We are observing an issue with pmm-agent statuses after temporary network disconnects. It stays in Disconnected state, and PMM Server UI shows node as Failed, with agent in Unknown status. Per what we see in pmm-agent trace logs, it is reconnecting to the server, and able to send data after reconnect, but state remains broken.
Agent or server restart restores the correct statuses.
We have just 9 clients connected, and all of them except the local one on PMM Server have this issue.
Expected Results
After reconnect, agent is in Connected state, and UI shows that all is ok.
Actual Results
sudo pmm-admin list
Service type Service name Address and port Service ID
MongoDB dbhost01-mongo 127.0.0.1:27017 /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf
Agent type Status Metrics Mode Agent ID Service ID Port
pmm_agent Disconnected /agent_id/06698602-553b-4dcc-a879-09236136c734 0
node_exporter Running push /agent_id/3e0d31cb-ccfb-4160-a1d9-e6daaf036037 42001
mongodb_exporter Running push /agent_id/cf82eadb-cbef-4a39-a74d-1a0c63d6ed8d /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf 42002
mongodb_profiler_agent Running /agent_id/a222e840-2145-4743-b4fe-9d65e2822eb1 /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf 0
vmagent Running push /agent_id/31ec8f09-13a1-4c9e-9fca-39c59d9e4250 42000
Version
Agent ID : /agent_id/06698602-553b-4dcc-a879-09236136c734
Node ID : /node_id/3d6aa28c-9d71-424d-a3c6-e7bb9c736b33
Node name: dbhost01
PMM Server:
URL : https://pmm-test.domain.com:443/
Version: 2.43.2
PMM Client:
Connected : true
Time drift : 688.4µs
Latency : 609.81µs
Connection uptime: 100
pmm-admin version: 2.44.0
pmm-agent version: 2.44.0
Agents:
/agent_id/31ec8f09-13a1-4c9e-9fca-39c59d9e4250 vmagent Running 42000
/agent_id/3e0d31cb-ccfb-4160-a1d9-e6daaf036037 node_exporter Running 42001
/agent_id/a222e840-2145-4743-b4fe-9d65e2822eb1 mongodb_profiler_agent Running 0
/agent_id/cf82eadb-cbef-4a39-a74d-1a0c63d6ed8d mongodb_exporter Running 42002
Steps to reproduce
- Deployed PMM Server via Helm chart.
- Deployed PMM Client on a node with MongoDB.
- Connected Client to the Server, confirmed that all is green.
- Add temporary firewall rule to block traffic between Client and Server.
- Observe in Client log that connection is broken, and reconnect is running.
- Remove firewall rule, check that Client reconnects to the Server.
- Check the Client status in pmm-admin and Server UI, note that it is broken.
Relevant logs
pmm-agent trace log will be attached to a separate comment
Code of Conduct
- I agree to follow Percona Community Code of Conduct