-
Notifications
You must be signed in to change notification settings - Fork 205
Description
I have encountered an issue I see several others have run into. It's worth noting this advice is for AWS but could probably be adapted to whatever other platform you're deploying Wazuh to with some creativity.
See: #961 #308 #547 probably others...
By default the install creates a separate classic load balancer for manager, workers, and dashboard. To make life easier you can add a route53 DNS entry that points to each of these load balancers in the console. Or you can get the default amazon hostname for the load balancers to use in your agent join commands.
It's important to note the ports that are open on each of these load balancers. In particular tcp/1515 is only open on the manager and tcp/1514 is only open on the workers. To make the docs make more sense, it's important to note Workers ARE Managers, they just can't perform the registration step for a new agent. Much of what you will read about managers also applies to workers.
This doc explains the differences in a comprehensive way
tcp/1515 is the registration port, agents reach out here on the manager pod to register with wazuh and when this step is complete you see never connected until....
tcp/1514 is the communication port for agents, they reach out here to send up events to the wazuh cluster. If they are unable to connect to this port they will never show as active in the dashboard UI and you will see errors like the following in the /var/ossec/logs/ossec.log log:
2025/06/05 23:05:29 wazuh-agentd: INFO: Trying to connect to server ([wazuh-manager.yourdomain.com]:1514/tcp).
2025/06/05 23:07:40 wazuh-agentd: ERROR: (1216): Unable to connect to '[10.10.10.10]:1514/tcp': 'Transport endpoint is not connected'.
See here for some great troubleshooting steps for getting agents joined
When you use an agent join command like the one generated by the Wazuh UI with the kubernetes installation method like this you will see those errors in the logs and the result in the UI
curl -o wazuh-agent-4.12.0-1.x86_64.rpm https://packages.wazuh.com/4.x/yum/wazuh-agent-4.12.0-1.x86_64.rpm && \
sudo WAZUH_MANAGER='wazuh-manager.yourdomain.com' \
WAZUH_REGISTRATION_PASSWORD=$'YourSuperSecurePassword' \
WAZUH_AGENT_GROUP='default' \
WAZUH_AGENT_NAME='ARandomAgent' \
rpm -ihv wazuh-agent-4.12.0-1.x86_64.rpm
The reason this happens is the Agent will register with the wazuh-manager on the open tcp/1515 port on the load balancer and show up in the dashboard, but the status will be never connected. The reason for this can be found in the documentation here
When you only set the WAZUH_MANAGER environment variable it assumes thats where registration on tcp/1515 AND future communication on tcp/1514 will occur. Since we don't have that tcp/1514 port open on the manager load balancer the agent registers and then fails to connect.
Now if we look a little further down in on the deployment variables page we see a different variable for WAZUH_REGISTRATION_SERVER
If we slightly modify our agent registration command we can use that to split registration and future communication up between the manager and the worker load balancer hostnames:
AGENT_GROUP='default'
# this sets the worker loadbalancer as the target for events
WAZUH_MANAGER='wazuh-workers.mydomain.com'
# this tells the agent to register with the manager load balancer
WAZUH_REGISTRATION_SERVER='wazuh-manager.mydomain.com'
WAZUH_REGISTRATION_PASSWORD=$'YOUR_AUTHD_PASSWORD'
curl -o wazuh-agent-4.12.0-1.x86_64.rpm https://packages.wazuh.com/4.x/yum/wazuh-agent-4.12.0-1.x86_64.rpm && \
sudo WAZUH_MANAGER=${WAZUH_MANAGER} \
WAZUH_REGISTRATION_SERVER=${WAZUH_REGISTRATION_SERVER} \
WAZUH_REGISTRATION_PASSWORD=${WAZUH_REGISTRATION_PASSWORD} \
WAZUH_AGENT_GROUP=${AGENT_GROUP} rpm -ihv wazuh-agent-4.12.0-1.x86_64.rpm
The downside here is the manager pod will not receive the incoming agent events, only the workers will. If we want to allow all manager and worker pods to be able to get agent events we can combine one of the suggested workarounds with a slightly different agent registration command:
first add this port to your wazuh-manager-svc.yaml or override it with a kustomization patch:
- name: agents-events
port: 1514
targetPort: 1514
Delete your manager service and then redeploy with the new port, this will force the manager load balancer to be replaced with one that has the tcp/1514 port open.
If you already created a Route53 DNS entry for wazuh-manager.yourdomain.com go update it with the new load balancer dns name or if you're using the LB dns entry be sure to grab the new one for your command.
Once that is all deployed go to an agent you need to register and use this command to register it with the cluster:
AGENT_GROUP='default'
# this sets the worker loadbalancer as the first target for events on that tcp/1514, that failing it tries the manager.
WAZUH_MANAGER='wazuh-workers.yourdomain.com,wazuh-manager.yourdomain.com'
# this tells the agent to register with the manager load balancer
WAZUH_REGISTRATION_SERVER='wazuh-manager.mydomain.com'
WAZUH_REGISTRATION_PASSWORD=$'YOUR_AUTHD_PASSWORD'
curl -o wazuh-agent-4.12.0-1.x86_64.rpm https://packages.wazuh.com/4.x/yum/wazuh-agent-4.12.0-1.x86_64.rpm && \
sudo WAZUH_MANAGER=${WAZUH_MANAGER} \
WAZUH_REGISTRATION_SERVER=${WAZUH_REGISTRATION_SERVER} \
WAZUH_REGISTRATION_PASSWORD=${WAZUH_REGISTRATION_PASSWORD} \
WAZUH_AGENT_GROUP=${AGENT_GROUP} rpm -ihv wazuh-agent-4.12.0-1.x86_64.rpm
As the comments mention, the WAZUH_MANAGER environment variable accepts a comma separated list of hostnames or IP addresses. The Agent will try to connect to each of the entries in the list in order. By putting the workers first it'll always try to connect to the workers first but if workers are unavailable it will connect to the manager.
Here's a reference in the docs.
Long winded explanation of my process of working this out... and the first method of resolving this is also covered here in the docs I for one completely missed that in the docs, so it might be worthwhile to find a way to draw attention to it and possibly offer the alternate second option I posed above.
A third option I won't go too far into would be using the ALB Ingress instead of classic load balancers and using listener rules to point requests for the manager/worker hostnames to the correct target group with an open port for registration and workers. Add in ExternalDNS using route53 and it'd be a fully automated solution.
If there's interest I could explore that third option and make up a PR with the alternate service configuration for ALBs and ExternalDNS updating Route53.