Skip to content

Comments

mtest: wait for update of kubernetes eps#851

Merged
masa213f merged 1 commit intomainfrom
fix-flaky
Feb 12, 2026
Merged

mtest: wait for update of kubernetes eps#851
masa213f merged 1 commit intomainfrom
fix-flaky

Conversation

@masa213f
Copy link
Contributor

@masa213f masa213f commented Feb 10, 2026

This PR fixes a flaky test in the robustness test.

After the #824, the robustness test started failing as follows.

CI log

https://github.com/cybozu-go/cke/actions/runs/21621901325/job/62312619840

Tue, 03 Feb 2026 08:03:18 GMT Test CKE robustness should stop control plane nodes
Tue, 03 Feb 2026 08:03:18 GMT /root/go/src/github.com/cybozu-go/cke/mtest/robustness_test.go:16
Tue, 03 Feb 2026 08:03:54 GMT • [36.263 seconds]
Tue, 03 Feb 2026 08:03:54 GMT ------------------------------
Tue, 03 Feb 2026 08:03:54 GMT Test CKE robustness should update endpoints
Tue, 03 Feb 2026 08:03:54 GMT /root/go/src/github.com/cybozu-go/cke/mtest/robustness_test.go:31
Tue, 03 Feb 2026 08:03:54 GMT   [FAILED] in [It] - /root/go/src/github.com/cybozu-go/cke/mtest/robustness_test.go:39 @ 02/03/26 08:03:54.58
Tue, 03 Feb 2026 08:03:54 GMT • [FAILED] [0.102 seconds]
Tue, 03 Feb 2026 08:03:54 GMT Test CKE robustness [It] should update endpoints
Tue, 03 Feb 2026 08:03:54 GMT /root/go/src/github.com/cybozu-go/cke/mtest/robustness_test.go:31
Tue, 03 Feb 2026 08:03:54 GMT 
Tue, 03 Feb 2026 08:03:54 GMT   [FAILED] Expected
Tue, 03 Feb 2026 08:03:54 GMT       <[]v1.Endpoint | len:3, cap:4>: [
...
Tue, 03 Feb 2026 08:03:54 GMT       ]
Tue, 03 Feb 2026 08:03:54 GMT   to consist of
Tue, 03 Feb 2026 08:03:54 GMT       <[]v1.Endpoint | len:3, cap:3>: [
...
Tue, 03 Feb 2026 08:03:54 GMT       ]
Tue, 03 Feb 2026 08:03:54 GMT   the missing elements were
Tue, 03 Feb 2026 08:03:54 GMT       <[]v1.Endpoint | len:1, cap:1>: [
Tue, 03 Feb 2026 08:03:54 GMT           ***
Tue, 03 Feb 2026 08:03:54 GMT               Addresses: ["10.0.0.102"],
Tue, 03 Feb 2026 08:03:54 GMT               Conditions: ***Ready: false, Serving: nil, Terminating: nil***,
...
Tue, 03 Feb 2026 08:03:54 GMT               Hints: nil,
Tue, 03 Feb 2026 08:03:54 GMT           ***,
Tue, 03 Feb 2026 08:03:54 GMT       ]
Tue, 03 Feb 2026 08:03:54 GMT   the extra elements were
Tue, 03 Feb 2026 08:03:54 GMT       <[]v1.Endpoint | len:1, cap:1>: [
Tue, 03 Feb 2026 08:03:54 GMT           ***
Tue, 03 Feb 2026 08:03:54 GMT               Addresses: ["10.0.0.102"],
Tue, 03 Feb 2026 08:03:54 GMT               Conditions: ***Ready: true, Serving: nil, Terminating: nil***,
...
Tue, 03 Feb 2026 08:03:54 GMT           ***,
Tue, 03 Feb 2026 08:03:54 GMT       ]
Tue, 03 Feb 2026 08:03:54 GMT   In [It] at: /root/go/src/github.com/cybozu-go/cke/mtest/robustness_test.go:39 @ 02/03/26 08:03:54.58

Upon checking, when shutting down a CP node in the test, all kube-apiservers sometimes go down simultaneously.

Feb 03 08:03:30 host2 docker[1925]: 2026-02-03T08:03:30.698663Z host2 cke error: "failed to dial: " address="10.0.0.102" error="dial tcp 10.0.0.102:22: connect: no route to host"
Feb 03 08:03:30 host2 docker[1925]: 2026-02-03T08:03:30.698668Z host2 cke warning: "failed to create SSHAgent for 10.0.0.102" error="dial tcp 10.0.0.102:22: connect: no route to host"
Feb 03 08:03:30 host2 docker[1925]: 2026-02-03T08:03:30.700019Z host2 cke debug: "well: waiting for all goroutines to complete"
Feb 03 08:03:35 host2 docker[1925]: 2026-02-03T08:03:35.836582Z host2 cke warning: "failed to check API server health" error="Get \"https://10.0.0.103:6443/api/v1/namespaces\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)" node="10.0.0.103"
Feb 03 08:03:35 host2 docker[1925]: 2026-02-03T08:03:35.838291Z host2 cke warning: "failed to check API server health" error="Get \"https://10.0.0.101:6443/api/v1/namespaces\": net/http: request canceled (Client.Timeout exceeded while awaiting headers)" node="10.0.0.101"

In this case, CKE skips updating the Kubernetes EndpointSlice and it transitions to the completed state.
And, the test cannot wait for the EndpointSlice update here, and the next update check failed.

I attempted to modify the CKE logic, but it was difficult. So I will check the EndpointSlice update in Eventually.

Signed-off-by: Masayuki Ishii <masa213f@gmail.com>
Copy link
Contributor

@YZ775 YZ775 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@masa213f masa213f merged commit c21a732 into main Feb 12, 2026
51 of 56 checks passed
@masa213f masa213f deleted the fix-flaky branch February 12, 2026 04:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants