Skip to content

Commit 0db759d

Browse files
oilbeaterclaude
andcommitted
fix: add retry logic to kubectl-ko getOvnCentralPod for leader election
getOvnCentralPod() would crash silently during OVN leader election transitions. Under `set -euo pipefail`, when no pod had the leader label (e.g. ovn-nb-leader=true), `grep ovn-central` in the pipeline returned exit code 1, causing the script to exit immediately without any error message. This made kubectl-ko trace and other subcommands fail intermittently in e2e tests. Extract a getLeaderPod() helper that retries up to 10 times with 1s intervals, protecting the pipeline with `set +o pipefail` and suppressing kubectl stderr noise. Also fix NORTHD_POD query to use $KUBE_OVN_NS instead of hardcoded kube-system. Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com> Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
1 parent 53b0946 commit 0db759d

File tree

1 file changed

+22
-19
lines changed

1 file changed

+22
-19
lines changed

dist/images/kubectl-ko

Lines changed: 22 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -811,26 +811,29 @@ diagnose(){
811811
esac
812812
}
813813

814+
getLeaderPod(){
815+
local label=$1
816+
local role=$2
817+
local result=
818+
for i in $(seq 1 10); do
819+
set +o pipefail
820+
result=$(kubectl get pod -n $KUBE_OVN_NS -l "$label"=true 2>/dev/null | grep ovn-central | awk '{if($2=="1/1" && $3=="Running") print $1}' | head -n 1)
821+
set -o pipefail
822+
if [ -n "$result" ]; then
823+
echo "$result"
824+
return
825+
fi
826+
sleep 1
827+
done
828+
echo "$role leader not exists" >&2
829+
exit 1
830+
}
831+
814832
getOvnCentralPod(){
815-
NB_POD=$(kubectl get pod -n $KUBE_OVN_NS -l ovn-nb-leader=true | grep ovn-central | awk '{if($2=="1/1" && $3=="Running") print $1}' | head -n 1 )
816-
if [ -z "$NB_POD" ]; then
817-
echo "nb leader not exists"
818-
exit 1
819-
fi
820-
OVN_NB_POD=$NB_POD
821-
SB_POD=$(kubectl get pod -n $KUBE_OVN_NS -l ovn-sb-leader=true | grep ovn-central | awk '{if($2=="1/1" && $3=="Running") print $1}' | head -n 1 )
822-
if [ -z "$SB_POD" ]; then
823-
echo "sb leader not exists"
824-
exit 1
825-
fi
826-
OVN_SB_POD=$SB_POD
827-
NORTHD_POD=$(kubectl get pod -n kube-system -l ovn-northd-leader=true | grep ovn-central | head -n 1 | awk '{print $1}')
828-
if [ -z "$NORTHD_POD" ]; then
829-
echo "ovn northd not exists"
830-
exit 1
831-
fi
832-
OVN_NORTHD_POD=$NORTHD_POD
833-
image=$(kubectl -n kube-system get pods -l app=kube-ovn-cni -o jsonpath='{.items[0].spec.containers[0].image}')
833+
OVN_NB_POD=$(getLeaderPod "ovn-nb-leader" "nb")
834+
OVN_SB_POD=$(getLeaderPod "ovn-sb-leader" "sb")
835+
OVN_NORTHD_POD=$(getLeaderPod "ovn-northd-leader" "northd")
836+
image=$(kubectl -n $KUBE_OVN_NS get pods -l app=kube-ovn-cni -o jsonpath='{.items[0].spec.containers[0].image}')
834837
if [ -z "$image" ]; then
835838
echo "cannot get kube-ovn image"
836839
exit 1

0 commit comments

Comments
 (0)