Skip to content

[bug] Etcd cluster status reported incorrectly. #742

Open
@TimJones

Description

@TimJones

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

In the Omni Clouster Overview window, the right-hand-side pane with 'Control Plane' status reports Etcd as 'OK' even if it is not.

image

❯ talosctl -n ip-10-0-0-148 services    
NODE         SERVICE      STATE     HEALTH   LAST CHANGE     LAST EVENT
10.0.0.148   apid         Running   OK       2h36m52s ago    Health check successful
10.0.0.148   containerd   Running   OK       23h42m55s ago   Health check successful
10.0.0.148   cri          Running   OK       2h36m52s ago    Health check successful
10.0.0.148   dashboard    Running   ?        23h42m53s ago   Process Process(["/sbin/dashboard"]) started with PID 4094
10.0.0.148   etcd         Running   Fail     2h35m8s ago     Health check failed: context deadline exceeded
10.0.0.148   kubelet      Running   OK       2h36m46s ago    Health check successful
10.0.0.148   machined     Running   OK       23h42m55s ago   Health check successful
10.0.0.148   syslogd      Running   OK       23h42m54s ago   Health check successful
10.0.0.148   trustd       Running   OK       2h36m51s ago    Health check successful
10.0.0.148   udevd        Running   OK       23h42m55s ago   Health check successful

❯ talosctl -n ip-10-0-1-152 services           
NODE         SERVICE      STATE     HEALTH   LAST CHANGE   LAST EVENT
10.0.1.152   apid         Running   OK       5m24s ago     Health check successful
10.0.1.152   containerd   Running   OK       5m25s ago     Health check successful
10.0.1.152   cri          Running   OK       5m23s ago     Health check successful
10.0.1.152   dashboard    Running   ?        5m25s ago     Process Process(["/sbin/dashboard"]) started with PID 4103
10.0.1.152   etcd         Running   Fail     5m3s ago      Health check failed: context deadline exceeded
10.0.1.152   kubelet      Running   OK       5m21s ago     Health check successful
10.0.1.152   machined     Running   OK       5m25s ago     Health check successful
10.0.1.152   syslogd      Running   OK       5m25s ago     Health check successful
10.0.1.152   trustd       Running   OK       5m23s ago     Health check successful
10.0.1.152   udevd        Running   OK       5m25s ago     Health check successful

❯ talosctl -n ip-10-0-2-176 services  
NODE         SERVICE      STATE     HEALTH   LAST CHANGE     LAST EVENT
10.0.2.176   apid         Running   OK       2h37m17s ago    Health check successful
10.0.2.176   containerd   Running   OK       23h43m20s ago   Health check successful
10.0.2.176   cri          Running   OK       2h37m17s ago    Health check successful
10.0.2.176   dashboard    Running   ?        23h43m18s ago   Process Process(["/sbin/dashboard"]) started with PID 4093
10.0.2.176   etcd         Running   Fail     2h35m14s ago    Health check failed: context deadline exceeded
10.0.2.176   kubelet      Running   OK       2h37m11s ago    Health check successful
10.0.2.176   machined     Running   OK       23h43m20s ago   Health check successful
10.0.2.176   syslogd      Running   OK       23h43m19s ago   Health check successful
10.0.2.176   trustd       Running   OK       2h37m17s ago    Health check successful
10.0.2.176   udevd        Running   OK       23h43m19s ago   Health check successful

Expected Behavior

Etcd status to reflect actual etcd cluster status.

Steps To Reproduce

  1. Register 3 nodes in Omni that connect connect to each other
  2. Form them into a single cluster control plane
  3. Review etcd status

What browsers are you seeing the problem on?

Firefox

Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions