Skip to content

NodeKiller seems to be not working in 100 node 1.17 / master performance tests #1005

Open
@mm4tt

Description

@mm4tt

Original debugging done by @jkaniuk:

In 100 nodes OSS performance tests of 1.16:
https://k8s-testgrid.appspot.com/sig-scalability-gce#gce-cos-1.16-scalability-100

NodeKiller is consistently failing:
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability-stable1/1219228169567997957

W0120 12:25:21.234] I0120 12:25:21.234558   12979 nodes.go:105] NodeKiller: Rebooting "e2e-big-minion-group-tt6r" to repair the node
W0120 12:25:24.556] I0120 12:25:24.555774   12979 ssh.go:38] ssh to "e2e-big-minion-group-tt6r" finished with "External IP address was not found; defaulting to using IAP > tunneling.\npacket_write_wait: Connection to UNKNOWN port 65535: Broken pipe\r\nERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].\n": exit status 255
W0120 12:25:24.556] E0120 12:25:24.555839   12979 nodes.go:108] NodeKiller: Error while rebooting node "e2e-big-minion-group-tt6r": exit status 255

Metadata

Metadata

Labels

good first issueDenotes an issue ready for a new contributor, according to the "help wanted" guidelines.help wantedDenotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.lifecycle/frozenIndicates that an issue or PR should not be auto-closed due to staleness.

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions