Skip to content

address already in use #1956

@pete911

Description

@pete911

Describe the bug
Deployment on EKS (kubernetes) cluster fails (from time to time) with error:

failed to serve and listen","error":"listen tcp :4311: bind: address already in use

This port is used by fluent bit kuberntes filter (default value for aws_pod_association_port field)

The error comes from here - https://github.com/aws/amazon-cloudwatch-agent/blob/main/extension/server/extension.go#L127
And most likely is caused by this line - https://github.com/aws/amazon-cloudwatch-agent/blob/main/extension/server/extension.go#L110 This does not seem to wait for port to be freed.
This should be replaced by sever.Shutdown(...), but the whole code seem heavy weight to reload certs (shutting and starting server).

Steps to reproduce
EKS cluster with cloudwatch agent, there is similar issue (errors coming from aws fluent-bit) here - aws/amazon-cloudwatch-agent-operator#269

[filter:kubernetes:kubernetes.1] no upstream connections available to cloudwatch-agent.amazon-cloudwatch:4311

What did you expect to see?
No errors, if server not reloaded do not just blindly log error in go routine (no error is returned from the method) and pretend everything is ok - https://github.com/aws/amazon-cloudwatch-agent/blob/main/extension/server/extension.go#L127
At least there should be retry if the port is available before starting server in go routine.

What did you see instead?
Server restarted when reloadServer is called or error returned. None of this currently happens.

What version did you use?
latest

What config did you use?
default

Environment
OS: linux

Additional context
Add any other context about the problem here.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions