Agent validation #9526

till · 2019-08-12T11:03:40Z

till
Aug 12, 2019

Is your feature request related to a problem? Please describe.

#3077 — Swarm cluster setup is pretty sophisticated and I guess prone to errors. Add to that, there is currently no validation that everything is setup and working and you have to trial and error in Portainer to find out if e.g. all your agents are connected.

Describe the solution you'd like

I would be great to provide a screen/overview that gives people ✅ for cluster setup, in regards to portainer-agent setups (including firewalls, etc.).

For example:

We could be verify that each node has a portainer/agent running.
Firewall checks (for inter-cluster comm):

required ports (though with UDP that's a little more tricky)
small TCP checks to see if portainer/portainer can be reached?

Provide a checklist on the agent's network
Suggest best practices (via an automated checklist):

managers
workers

Most of the information can be retrieved using the agent itself, or docker swarm calls on the master.

Alternative solutions

If a screen is too much, then maybe a Docker image to run to validate that all agents and firewalls are setup correctly would be a good first step to ensure everything is working.

Additional context

I had a Swarm cluster setup where all nodes reported in (in the cluster overview), but I was unable to e.g. retrieve volumes and containers from a node in the cluster. My guess was that the network had to be attachable, but it doesn't seem that this is the case and something else fixed the issue. I also restarted the agents in the process, so maybe that was the "real" fix. Or maybe there was something else with the Swarm network that was not setup correctly.

ghost · 2019-08-12T11:08:06Z

ghost
Aug 12, 2019

Sounds promising! 👍

0 replies

till · 2019-08-25T07:22:50Z

till
Aug 25, 2019
Author

I tested this again, every time we reboot/recycle a node in a Docker Swarm cluster, one of them is unavailable.

Quickfix is to docker service update —force agent

0 replies

ghost · 2019-08-28T01:56:14Z

ghost
Aug 28, 2019

I discussed this with deviantony, he believes this is related to #2938 where the agent takes a long time to acknowledge that another agent went down

0 replies

deviantony · 2019-09-17T19:17:47Z

deviantony
Sep 17, 2019
Maintainer

Hey @till

Can you give us more information about this case? We'd like to try and reproduce the problem.

I tested this again, every time we reboot/recycle a node in a Docker Swarm cluster, one of them is unavailable.

0 replies

till · 2019-09-18T09:01:49Z

till
Sep 18, 2019
Author

@deviantony what would you like to know? We're using latest of everything Portainer and Docker version 19.03.2, build 6a30dfc

0 replies

deviantony · 2019-09-18T19:00:00Z

deviantony
Sep 18, 2019
Maintainer

@till thanks for the update, I would like to know which steps you use in the UI to reproduce this endpoint unavailable issue. E.g., I reboot a node, then go to containers in the UI...

0 replies

till · 2019-09-19T10:37:49Z

till
Sep 19, 2019
Author

@deviantony I check for running containers and volumes. When I don't see containers/volumes from the other node, I know something is wrong. Another way we found this at first was when we try to console into a service, that's also "broken" then.

Then verified the agent logs, which led me to relaunching the service.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Portainer.io

Agent validation #9526

{{title}}

Replies: 7 comments

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

Select a reply

Portainer.io

Agent validation #9526

till Aug 12, 2019

Replies: 7 comments

ghost Aug 12, 2019

till Aug 25, 2019 Author

ghost Aug 28, 2019

deviantony Sep 17, 2019 Maintainer

till Sep 18, 2019 Author

deviantony Sep 18, 2019 Maintainer

till Sep 19, 2019 Author

till
Aug 12, 2019

ghost
Aug 12, 2019

till
Aug 25, 2019
Author

ghost
Aug 28, 2019

deviantony
Sep 17, 2019
Maintainer

till
Sep 18, 2019
Author

deviantony
Sep 18, 2019
Maintainer

till
Sep 19, 2019
Author