Skip to content

Cloudflared does not unregister lb origins on termination when running as sidecar since named tunnels were introduced #247

Open
@ffilippopoulos

Description

@ffilippopoulos

Environment:
Kubernetes: v1.19.2
os: flatcar Container Linux by Kinvolk 2605.6.0
kernel: 5.4.67-flatcar
docker://19.3.12
cloudflared version >= 2020.7.0

We have a kubernetes deployment with 3 replicas that all run cloudflared tunnel as sidecar with the following arguments:

        - --no-autoupdate                                                   
        - --url=http://127.0.0.1:80                                         
        - --hostname=<my-hostname>               
        - --lb-pool=<my-lb-pool>                                
        - --origincert=/etc/cloudflared/cert.pem

, where hostname is the name of a load balancer created under our cloudflaire account and lb pool name a defined pool.
That way containers are able to join the pool's origins and the lb sends traffic down to our pods successfully.

The issue:
When rolling pods we observe that the respective origins enter Critical state for some time until cloudflare cleans them. From the cloudflared sidecar point of view we see the following logs:

time="2020-10-14T09:01:34Z" level=error msg="Register tunnel error from server side" connectionID=0 error="Server error: the origin list length must be in range [1, 5]: validation failed"
time="2020-10-14T09:01:34Z" level=info msg="Retrying in 1s seconds" connectionID=0

meaning that because we are using the full capacity of allowed origins the new pod cannot be registered until the cleanup happens.

It looks like this was introduced at first with commit: 8cc69f2 and one could work arounnd it via setting the flag:
--use-quick-reconnects=false
This could work until version 2020.6.6.

The behaviour changed again here: 2a3d486 when implementing named tunnels. As far as I can understand we cannot use named tunnels with a load balancer atm(?)

Changing our deployment to a statefulSet, in order to be able to set a constant --name flag per pod to test the above, results in pods crashing after logging:

INFO[2020-10-14T10:36:51Z] Tunnel already created with ID 817524d6-4c5e-4faa-8cbb-dd1e98d5d5fc
INFO[2020-10-14T10:36:52Z] Load balancer <my-hostname> already uses pool <my-lb-pool> which has this tunnel as an origin

Is there a way to use the newer cloudflared images with a load balancer and preserve the behaviour (quickly remove critical origins so we can successfully roll our deployments pods)?

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DocumentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions