Skip to content

Trouble registering agents to master/worker "never connected" in UI and 'Transport endpoint is not connected' in ossec.log #1091

@DrewTittle

Description

@DrewTittle

I have encountered an issue I see several others have run into. It's worth noting this advice is for AWS but could probably be adapted to whatever other platform you're deploying Wazuh to with some creativity.

See: #961 #308 #547 probably others...

By default the install creates a separate classic load balancer for manager, workers, and dashboard. To make life easier you can add a route53 DNS entry that points to each of these load balancers in the console. Or you can get the default amazon hostname for the load balancers to use in your agent join commands.

It's important to note the ports that are open on each of these load balancers. In particular tcp/1515 is only open on the manager and tcp/1514 is only open on the workers. To make the docs make more sense, it's important to note Workers ARE Managers, they just can't perform the registration step for a new agent. Much of what you will read about managers also applies to workers.

This doc explains the differences in a comprehensive way

tcp/1515 is the registration port, agents reach out here on the manager pod to register with wazuh and when this step is complete you see never connected until....

tcp/1514 is the communication port for agents, they reach out here to send up events to the wazuh cluster. If they are unable to connect to this port they will never show as active in the dashboard UI and you will see errors like the following in the /var/ossec/logs/ossec.log log:

2025/06/05 23:05:29 wazuh-agentd: INFO: Trying to connect to server ([wazuh-manager.yourdomain.com]:1514/tcp).
2025/06/05 23:07:40 wazuh-agentd: ERROR: (1216): Unable to connect to '[10.10.10.10]:1514/tcp': 'Transport endpoint is not connected'.

See here for some great troubleshooting steps for getting agents joined

When you use an agent join command like the one generated by the Wazuh UI with the kubernetes installation method like this you will see those errors in the logs and the result in the UI

curl -o wazuh-agent-4.12.0-1.x86_64.rpm https://packages.wazuh.com/4.x/yum/wazuh-agent-4.12.0-1.x86_64.rpm && \
  sudo WAZUH_MANAGER='wazuh-manager.yourdomain.com' \
  WAZUH_REGISTRATION_PASSWORD=$'YourSuperSecurePassword' \
  WAZUH_AGENT_GROUP='default' \
  WAZUH_AGENT_NAME='ARandomAgent' \
  rpm -ihv wazuh-agent-4.12.0-1.x86_64.rpm

The reason this happens is the Agent will register with the wazuh-manager on the open tcp/1515 port on the load balancer and show up in the dashboard, but the status will be never connected. The reason for this can be found in the documentation here

When you only set the WAZUH_MANAGER environment variable it assumes thats where registration on tcp/1515 AND future communication on tcp/1514 will occur. Since we don't have that tcp/1514 port open on the manager load balancer the agent registers and then fails to connect.

Now if we look a little further down in on the deployment variables page we see a different variable for WAZUH_REGISTRATION_SERVER

If we slightly modify our agent registration command we can use that to split registration and future communication up between the manager and the worker load balancer hostnames:

AGENT_GROUP='default' 

# this sets the worker loadbalancer as the target for events
WAZUH_MANAGER='wazuh-workers.mydomain.com'

# this tells the agent to register with the manager load balancer
WAZUH_REGISTRATION_SERVER='wazuh-manager.mydomain.com'
WAZUH_REGISTRATION_PASSWORD=$'YOUR_AUTHD_PASSWORD'

curl -o wazuh-agent-4.12.0-1.x86_64.rpm https://packages.wazuh.com/4.x/yum/wazuh-agent-4.12.0-1.x86_64.rpm && \
  sudo WAZUH_MANAGER=${WAZUH_MANAGER} \
  WAZUH_REGISTRATION_SERVER=${WAZUH_REGISTRATION_SERVER} \
  WAZUH_REGISTRATION_PASSWORD=${WAZUH_REGISTRATION_PASSWORD} \
  WAZUH_AGENT_GROUP=${AGENT_GROUP} rpm -ihv wazuh-agent-4.12.0-1.x86_64.rpm

The downside here is the manager pod will not receive the incoming agent events, only the workers will. If we want to allow all manager and worker pods to be able to get agent events we can combine one of the suggested workarounds with a slightly different agent registration command:

first add this port to your wazuh-manager-svc.yaml or override it with a kustomization patch:

- name: agents-events
   port: 1514
   targetPort: 1514

Delete your manager service and then redeploy with the new port, this will force the manager load balancer to be replaced with one that has the tcp/1514 port open.

If you already created a Route53 DNS entry for wazuh-manager.yourdomain.com go update it with the new load balancer dns name or if you're using the LB dns entry be sure to grab the new one for your command.

Once that is all deployed go to an agent you need to register and use this command to register it with the cluster:

AGENT_GROUP='default' 
# this sets the worker loadbalancer as the first target for events on that tcp/1514, that failing it tries the manager.
WAZUH_MANAGER='wazuh-workers.yourdomain.com,wazuh-manager.yourdomain.com'
# this tells the agent to register with the manager load balancer
WAZUH_REGISTRATION_SERVER='wazuh-manager.mydomain.com'
WAZUH_REGISTRATION_PASSWORD=$'YOUR_AUTHD_PASSWORD'

curl -o wazuh-agent-4.12.0-1.x86_64.rpm https://packages.wazuh.com/4.x/yum/wazuh-agent-4.12.0-1.x86_64.rpm && \
  sudo WAZUH_MANAGER=${WAZUH_MANAGER} \
  WAZUH_REGISTRATION_SERVER=${WAZUH_REGISTRATION_SERVER} \
  WAZUH_REGISTRATION_PASSWORD=${WAZUH_REGISTRATION_PASSWORD} \
  WAZUH_AGENT_GROUP=${AGENT_GROUP} rpm -ihv wazuh-agent-4.12.0-1.x86_64.rpm

As the comments mention, the WAZUH_MANAGER environment variable accepts a comma separated list of hostnames or IP addresses. The Agent will try to connect to each of the entries in the list in order. By putting the workers first it'll always try to connect to the workers first but if workers are unavailable it will connect to the manager.

Here's a reference in the docs.

Long winded explanation of my process of working this out... and the first method of resolving this is also covered here in the docs I for one completely missed that in the docs, so it might be worthwhile to find a way to draw attention to it and possibly offer the alternate second option I posed above.

A third option I won't go too far into would be using the ALB Ingress instead of classic load balancers and using listener rules to point requests for the manager/worker hostnames to the correct target group with an open port for registration and workers. Add in ExternalDNS using route53 and it'd be a fully automated solution.

If there's interest I could explore that third option and make up a PR with the alternate service configuration for ALBs and ExternalDNS updating Route53.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions