Open
Description
Original debugging done by @jkaniuk:
In 100 nodes OSS performance tests of 1.16:
https://k8s-testgrid.appspot.com/sig-scalability-gce#gce-cos-1.16-scalability-100
NodeKiller is consistently failing:
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gci-gce-scalability-stable1/1219228169567997957
W0120 12:25:21.234] I0120 12:25:21.234558 12979 nodes.go:105] NodeKiller: Rebooting "e2e-big-minion-group-tt6r" to repair the node W0120 12:25:24.556] I0120 12:25:24.555774 12979 ssh.go:38] ssh to "e2e-big-minion-group-tt6r" finished with "External IP address was not found; defaulting to using IAP > tunneling.\npacket_write_wait: Connection to UNKNOWN port 65535: Broken pipe\r\nERROR: (gcloud.compute.ssh) [/usr/bin/ssh] exited with return code [255].\n": exit status 255 W0120 12:25:24.556] E0120 12:25:24.555839 12979 nodes.go:108] NodeKiller: Error while rebooting node "e2e-big-minion-group-tt6r": exit status 255