Skip to content

pmm-agent shown as Disconnected/Unknown/Failed after temporary network disconnect #3441

Open
@ValeriiVozniuk

Description

@ValeriiVozniuk

Description

Hi,

We are observing an issue with pmm-agent statuses after temporary network disconnects. It stays in Disconnected state, and PMM Server UI shows node as Failed, with agent in Unknown status. Per what we see in pmm-agent trace logs, it is reconnecting to the server, and able to send data after reconnect, but state remains broken.
Agent or server restart restores the correct statuses.
We have just 9 clients connected, and all of them except the local one on PMM Server have this issue.

Expected Results

After reconnect, agent is in Connected state, and UI shows that all is ok.

Actual Results

sudo pmm-admin list
Service type        Service name                                     Address and port        Service ID
MongoDB             dbhost01-mongo        127.0.0.1:27017         /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf

Agent type                    Status              Metrics Mode        Agent ID                                              Service ID                                              Port
pmm_agent                     Disconnected                            /agent_id/06698602-553b-4dcc-a879-09236136c734                                                                0
node_exporter                 Running             push                /agent_id/3e0d31cb-ccfb-4160-a1d9-e6daaf036037                                                                42001
mongodb_exporter              Running             push                /agent_id/cf82eadb-cbef-4a39-a74d-1a0c63d6ed8d        /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf        42002
mongodb_profiler_agent        Running                                 /agent_id/a222e840-2145-4743-b4fe-9d65e2822eb1        /service_id/00d622c0-0504-4d60-8d5d-b7a4e85ec2bf        0
vmagent                       Running             push                /agent_id/31ec8f09-13a1-4c9e-9fca-39c59d9e4250                                                                42000

UI1
UI2

Version

Agent ID : /agent_id/06698602-553b-4dcc-a879-09236136c734
Node ID  : /node_id/3d6aa28c-9d71-424d-a3c6-e7bb9c736b33
Node name: dbhost01

PMM Server:
        URL    : https://pmm-test.domain.com:443/
        Version: 2.43.2

PMM Client:
        Connected        : true
        Time drift       : 688.4µs
        Latency          : 609.81µs
        Connection uptime: 100
        pmm-admin version: 2.44.0
        pmm-agent version: 2.44.0
Agents:
        /agent_id/31ec8f09-13a1-4c9e-9fca-39c59d9e4250 vmagent Running 42000
        /agent_id/3e0d31cb-ccfb-4160-a1d9-e6daaf036037 node_exporter Running 42001
        /agent_id/a222e840-2145-4743-b4fe-9d65e2822eb1 mongodb_profiler_agent Running 0
        /agent_id/cf82eadb-cbef-4a39-a74d-1a0c63d6ed8d mongodb_exporter Running 42002

Steps to reproduce

  1. Deployed PMM Server via Helm chart.
  2. Deployed PMM Client on a node with MongoDB.
  3. Connected Client to the Server, confirmed that all is green.
  4. Add temporary firewall rule to block traffic between Client and Server.
  5. Observe in Client log that connection is broken, and reconnect is running.
  6. Remove firewall rule, check that Client reconnects to the Server.
  7. Check the Client status in pmm-admin and Server UI, note that it is broken.

Relevant logs

pmm-agent trace log will be attached to a separate comment

Code of Conduct

  • I agree to follow Percona Community Code of Conduct

Metadata

Metadata

Assignees

Labels

bugBug report

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions