Skip to content

chore: add startup probe to avoid false unhealthy events#222

Open
fmuyassarov wants to merge 1 commit into
kubernetes-sigs:mainfrom
Nordix:fix/probes
Open

chore: add startup probe to avoid false unhealthy events#222
fmuyassarov wants to merge 1 commit into
kubernetes-sigs:mainfrom
Nordix:fix/probes

Conversation

@fmuyassarov
Copy link
Copy Markdown
Member

What type of PR is this?

/kind cleanup

What this PR does / why we need it:

Give the driver enough time to complete plugin registration before throwing unhealthy events, which may give wrong signal. While introducing events in #221, I noticed probe failure events on the driver Pod and I was a bit confused for a while until I realized that readinessProbe was faster than DRANET waiting timeout for the kubelet to complete plugin registration (30 sec). Thus, adding startup timeout should avoid getting false positives even if the registration timeout takes longer but no more than 30 sec which is set here.

I've given in total 60s for the startup probe (2 times 30s timeout set for the kubelet plugin registration) and those extra seconds basically to cover image pulling and other stuff that are not directly about the driver itself.

before

Events:
  Type     Reason     Age   From               Message
  ----     ------     ----  ----               -------
  Normal   Scheduled  14s   default-scheduler  Successfully assigned kube-system/dranet-7sxql to dra-worker2
  Normal   Pulled     13s   kubelet            spec.containers{dranet}: Container image "registry.k8s.io/networking/dranet:v1.3.0" already present on machine and can be accessed by the pod
  Normal   Created    13s   kubelet            spec.containers{dranet}: Container created
  Normal   Started    13s   kubelet            spec.containers{dranet}: Container started
  Warning  Unhealthy  13s   kubelet            spec.containers{dranet}: Readiness probe failed: Get "http://172.18.0.3:9177/healthz": dial tcp 172.18.0.3:9177: connect: connection refused
  Warning  Unhealthy  12s   kubelet            spec.containers{dranet}: Readiness probe failed: HTTP probe failed with statuscode: 503

after

Events:
  Type    Reason     Age   From               Message
  ----    ------     ----  ----               -------
  Normal  Scheduled  29m   default-scheduler  Successfully assigned kube-system/dranet-m2gwf to dra-worker
  Normal  Pulling    29m   kubelet            spec.containers{dranet}: Pulling image "ttl.sh/dranet:3aa6a69"
  Normal  Pulled     29m   kubelet            spec.containers{dranet}: Successfully pulled image "ttl.sh/dranet:3aa6a69" in 4.939s (4.939s including waiting). Image size: 46028325 bytes.
  Normal  Created    29m   kubelet            spec.containers{dranet}: Container created
  Normal  Started    29m   kubelet            spec.containers{dranet}: Container started

Which issue(s) this PR is related to:

"N/A".

Special notes for your reviewer:

Does this PR introduce a user-facing change?

 "NONE"

Signed-off-by: Feruzjon Muyassarov <feruzjon.muyassarov@est.tech>
@k8s-ci-robot k8s-ci-robot added release-note-none Denotes a PR that doesn't merit a release note. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. labels Jun 5, 2026
@k8s-ci-robot
Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: fmuyassarov
Once this PR has been reviewed and has the lgtm label, please assign michaelasp for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jun 5, 2026
@netlify
Copy link
Copy Markdown

netlify Bot commented Jun 5, 2026

Deploy Preview for dranet canceled.

Name Link
🔨 Latest commit 6d6254f
🔍 Latest deploy log https://app.netlify.com/projects/dranet/deploys/6a22d919099c340008cc457d

@k8s-ci-robot k8s-ci-robot added the size/M Denotes a PR that changes 30-99 lines, ignoring generated files. label Jun 5, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. release-note-none Denotes a PR that doesn't merit a release note. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants