Skip to content

ci: wait for control-plane nodes before labeling in chart install#6370

Merged
oilbeater merged 1 commit intomasterfrom
fix/talos-install-label-race
Mar 1, 2026
Merged

ci: wait for control-plane nodes before labeling in chart install#6370
oilbeater merged 1 commit intomasterfrom
fix/talos-install-label-race

Conversation

@oilbeater
Copy link
Copy Markdown
Collaborator

Summary

  • Talos installation tests intermittently fail because kubectl label node -l node-role.kubernetes.io/control-plane kube-ovn/role=master silently succeeds (exit 0) when no nodes match the selector, leaving no nodes with kube-ovn/role=master
  • The subsequent helm install then fails at template rendering (_helpers.tpl) because lookup finds no nodes with that label
  • Add a polling loop (up to 120s) before kubectl label that waits for at least one node with node-role.kubernetes.io/control-plane to appear

Test plan

  • Verify Talos installation tests (overlay/underlay × ipv4/ipv6/dual) pass consistently
  • Verify helm-testing workflow passes

🤖 Generated with Claude Code

kubectl label with a label selector silently succeeds (exit 0) when no
nodes match, which causes the subsequent helm install to fail because
no nodes have the kube-ovn/role=master label. This happens intermittently
in Talos installation tests when the control-plane node has not yet
registered its node-role.kubernetes.io/control-plane label at the time
install-chart runs.

Add a polling loop (up to 120s) that waits for at least one node with
the control-plane label to appear before running kubectl label.

Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Mengxin Liu <liumengxinfly@gmail.com>
@dosubot dosubot bot added size:XS This PR changes 0-9 lines, ignoring generated files. chart Helm Chart ci labels Mar 1, 2026
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enhances the robustness of the CI pipeline by implementing a crucial synchronization step during Kubernetes chart installations. It prevents failures that occurred when node labeling commands were executed prematurely, ensuring that control-plane nodes are fully registered and available before proceeding with the Helm chart deployment.

Highlights

  • CI Stability Improvement: Resolved intermittent Talos installation test failures where kubectl label would silently succeed on unavailable control-plane nodes, leading to subsequent helm install failures due to missing labels.
  • Node Readiness Polling: Introduced a polling mechanism (up to 120 seconds) in the Makefile to ensure that at least one node-role.kubernetes.io/control-plane node is present and ready before attempting to apply the kube-ovn/role=master label.
Changelog
  • Makefile
    • Added a polling command before kubectl label in the install-chart target to wait for control-plane nodes.
    • Added a polling command before kubectl label in the install-chart-v2 target to wait for control-plane nodes.
Ignored Files
  • Ignored by pattern: .github/workflows/** (1)
    • .github/workflows/helm-testing.yaml
Activity
  • No review comments or specific activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request addresses an intermittent failure in CI by waiting for control-plane nodes to become available before attempting to label them. The approach of using a polling loop with a timeout is sound. My review focuses on improving the maintainability of the Makefile by suggesting the removal of duplicated code. By extracting the new waiting logic into a reusable define block, the code becomes cleaner and easier to manage in the future.


.PHONY: install-chart
install-chart:
@timeout 120 bash -c 'until kubectl get node -l node-role.kubernetes.io/control-plane -o name 2>/dev/null | grep -q .; do echo "Waiting for control-plane nodes to be labeled..."; sleep 2; done'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This command is duplicated in the install-chart-v2 target on line 254. To improve maintainability and avoid duplication, consider extracting this logic into a reusable define block, similar to other wait helpers in this Makefile like kubectl_wait_exist.

For example, you could add the following definition alongside the other define blocks:

define wait_for_labeled_nodes
	@timeout 120 bash -c 'until kubectl get node -l "$(1)" -o name 2>/dev/null | grep -q .; do echo "Waiting for nodes with label $(1) to appear..."; sleep 2; done'
endef

And then call it in both install-chart and install-chart-v2 targets. This also makes the waiting message clearer.

	$(call wait_for_labeled_nodes,node-role.kubernetes.io/control-plane)


.PHONY: install-chart-v2
install-chart-v2:
@timeout 120 bash -c 'until kubectl get node -l node-role.kubernetes.io/control-plane -o name 2>/dev/null | grep -q .; do echo "Waiting for control-plane nodes to be labeled..."; sleep 2; done'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

This command is a duplicate of the one in the install-chart target on line 189. To improve maintainability, please see my suggestion on that line to refactor this into a reusable define block.

	$(call wait_for_labeled_nodes,node-role.kubernetes.io/control-plane)

@coveralls
Copy link
Copy Markdown

Pull Request Test Coverage Report for Build 22543631985

Warning: This coverage report may be inaccurate.

This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage remained the same at 23.089%

Totals Coverage Status
Change from base Build 22538259073: 0.0%
Covered Lines: 12561
Relevant Lines: 54402

💛 - Coveralls

@oilbeater oilbeater merged commit 605d615 into master Mar 1, 2026
78 checks passed
@oilbeater oilbeater deleted the fix/talos-install-label-race branch March 1, 2026 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

chart Helm Chart ci size:XS This PR changes 0-9 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants