Replies: 7 comments
-
Sounds promising! 👍 |
Beta Was this translation helpful? Give feedback.
-
I tested this again, every time we reboot/recycle a node in a Docker Swarm cluster, one of them is unavailable. Quickfix is to |
Beta Was this translation helpful? Give feedback.
-
I discussed this with deviantony, he believes this is related to #2938 where the agent takes a long time to acknowledge that another agent went down |
Beta Was this translation helpful? Give feedback.
-
Hey @till Can you give us more information about this case? We'd like to try and reproduce the problem.
|
Beta Was this translation helpful? Give feedback.
-
@deviantony what would you like to know? We're using latest of everything Portainer and |
Beta Was this translation helpful? Give feedback.
-
@till thanks for the update, I would like to know which steps you use in the UI to reproduce this endpoint unavailable issue. E.g., I reboot a node, then go to containers in the UI... |
Beta Was this translation helpful? Give feedback.
-
@deviantony I check for running containers and volumes. When I don't see containers/volumes from the other node, I know something is wrong. Another way we found this at first was when we try to console into a service, that's also "broken" then. Then verified the agent logs, which led me to relaunching the service. |
Beta Was this translation helpful? Give feedback.
-
Is your feature request related to a problem? Please describe.
#3077 — Swarm cluster setup is pretty sophisticated and I guess prone to errors. Add to that, there is currently no validation that everything is setup and working and you have to trial and error in Portainer to find out if e.g. all your agents are connected.
Describe the solution you'd like
I would be great to provide a screen/overview that gives people ✅ for cluster setup, in regards to portainer-agent setups (including firewalls, etc.).
For example:
portainer/agent
running.portainer/portainer
can be reached?Most of the information can be retrieved using the agent itself, or
docker swarm
calls on the master.Alternative solutions
If a screen is too much, then maybe a Docker image to run to validate that all agents and firewalls are setup correctly would be a good first step to ensure everything is working.
Additional context
I had a Swarm cluster setup where all nodes reported in (in the cluster overview), but I was unable to e.g. retrieve volumes and containers from a node in the cluster. My guess was that the network had to be
attachable
, but it doesn't seem that this is the case and something else fixed the issue. I also restarted the agents in the process, so maybe that was the "real" fix. Or maybe there was something else with the Swarm network that was not setup correctly.Beta Was this translation helpful? Give feedback.
All reactions