Skip to content

k8s-worker remove-unit does not remove node from cluster #723

@selcem-artan

Description

@selcem-artan

Bug Description

Hello,

Initially I had a k8s cluster from 3 controlplane nodes and 8 worker.

ubuntu@app2:~$ sudo k8s kubectl get nodes
NAME   STATUS   ROLES                  AGE    VERSION
app1   Ready    control-plane,worker   4d6h   v1.32.8
app2   Ready    control-plane,worker   4d5h   v1.32.8
app3   Ready    control-plane,worker   4d5h   v1.32.8
app4   Ready    worker                 4d5h   v1.32.8
db1    Ready    worker                 4d5h   v1.32.8
db2    Ready    worker                 4d5h   v1.32.8
db3    Ready    worker                 4d5h   v1.32.8
db4    Ready    worker                 4d5h   v1.32.8
rt1    Ready    worker                 18h    v1.32.8
rt2    Ready    worker                 18h    v1.32.8
rt3    Ready    worker                 18h    v1.32.8

I needed to change 3 workers to another k8s-worker charm to configure with labels. So, I did "juju remove-unit k8s-worker/<>" for 3 nodes rt1,rt2 and rt3. But with new deployment, 3 units are stuck in waiting state for cluster.

Image

To Reproduce

juju deploy ch:k8s-worker
--channel=1.32/stable --revision=1376
--to 0,1,2,3,4,5,6

juju remove-unit k8s-worker/0
juju remove-unit k8s-worker/1
juju remove-unit k8s-worker/2

And re-deploy charm with a different name to these 3 nodes;

juju deploy ch:k8s-worker k8s-worker-nonmagnetic
--channel=1.32/stable --revision=1376
--config node-labels="storage=localdir"
--to 0,1,2

Environment

App Version Status Scale Charm Channel Rev Exposed Message
k8s 1.32.8 active 3 k8s 1.32/stable 1382 no Ready
k8s-worker 1.32.8 active 5 k8s-worker 1.32/stable 1376 no Ready
k8s-worker-nonmagnetic 1.32.8 active 3 k8s-worker 1.32/stable 1376 no Ready

Relevant log output

From workers;

-k8s-worker-nonmagnetic-3: 17:55:51 ERROR unit.k8s-worker-nonmagnetic/3.juju-log Caught ReconcilerError
Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/venv/lib/python3.10/site-packages/charms/contextual_status.py", line 101, in on_error
    yield
  File "/usr/lib/python3.10/contextlib.py", line 79, in inner
    return func(*args, **kwds)
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/src/charm.py", line 957, in _join_cluster
    self._join_with_token(token, remote_cluster)
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/src/charm.py", line 983, in _join_with_token
    self.api_manager.join_cluster(request)
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 972, in join_cluster
    self._send_request(endpoint, "POST", EmptyResponse, request)
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/lib/charms/k8s/v0/k8sd_api_manager.py", line 927, in _send_request
    raise InvalidResponseError(
charms.k8s.v0.k8sd_api_manager.InvalidResponseError: Error status 521
    method=POST
    endpoint=/1.0/k8sd/cluster/join
    reason=status code 521
    body={"type":"error","status":"","status_code":0,"operation":"","error_code":521,"error":"node \"rt3\" is part of the cluster","metadata":null}



The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/venv/lib/python3.10/site-packages/charms/reconciler.py", line 35, in reconcile
    self.reconcile_function(event)
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/src/charm.py", line 1036, in _reconcile
    self._join_cluster(event)
  File "/usr/lib/python3.10/contextlib.py", line 78, in inner
    with self._recreate_cm():
  File "/usr/lib/python3.10/contextlib.py", line 153, in __exit__
    self.gen.throw(typ, value, traceback)
  File "/var/lib/juju/agents/unit-k8s-worker-nonmagnetic-3/charm/venv/lib/python3.10/site-packages/charms/contextual_status.py", line 106, in on_error
    raise ReconcilerError(msg) from e
charms.contextual_status.ReconcilerError: Found expected exception: Error status 521
    method=POST
    endpoint=/1.0/k8sd/cluster/join
    reason=status code 521
    body={"type":"error","status":"","status_code":0,"operation":"","error_code":521,"error":"node \"rt3\" is part of the cluster","metadata":null}

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions