If the connection to a service fails via the first server a dmsg-direct client is configured with, it does not automatically fall over to another server. This does correctly happen with a regular dmsg client.
Services use dmsg-direct to avoid depending on the dmsg-discovery. We may want to reconsider this approach as it limits the ability of normal dmsg and dmsghttp clients to connect to services.
The connection fall-over to other dmsg servers must be fixed in the dmsg codebase for dmsg direct clients
I've attempted to fix this already with several PRs - but there is not a good fix on the client side for deployment issues. The changes I've made have been reverted, except shuffling the dmsg servers used to bootstrap the connection to dmsg using direct client.
The issues go away when the production deployment is restarted - specifically the issue seems to be more with the dmsg servers than services.