Conditions/nodestate by SchSeba · Pull Request #1012 · k8snetworkplumbingwg/sriov-network-operator

SchSeba · 2026-01-11T20:09:24Z

No description provided.

github-actions · 2026-01-11T20:09:31Z

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

/test-all: To run all tests for all vendors.
/test-e2e-all: To run all E2E tests for all vendors.
/test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

/skip-all: To skip all tests for all vendors.
/skip-e2e-all: To skip all E2E tests for all vendors.
/skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
Best regards.

gemini-code-assist · 2026-01-11T20:09:56Z

Summary of Changes

Hello @SchSeba, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the observability and operational transparency of the SR-IOV Network Operator by integrating standard Kubernetes conditions into the SriovNetworkNodeState Custom Resource Definition. This change provides a more standardized and machine-readable way to convey the health, progress, and any issues related to SR-IOV configurations and node drain operations. By introducing dedicated conditions for configuration and draining, along with a robust status patching mechanism, the operator now offers clearer insights into the state of individual nodes, facilitating better monitoring, troubleshooting, and automated reactions within a Kubernetes environment.

Highlights

Standardized Kubernetes Conditions: Introduces Ready, Progressing, Degraded, and specific DrainProgressing, DrainDegraded, DrainComplete conditions to the SriovNetworkNodeState CRD, aligning with Kubernetes best practices for status reporting.
Enhanced Node State Reporting: Provides granular status updates for node configuration and drain operations, including detailed reasons and messages for each condition, improving observability and troubleshooting.
Atomic Status Updates with Event Emission: Implements a new pkg/status package with a Patcher to handle status updates using strategic merge patches, preventing race conditions and ensuring data integrity. It also emits Kubernetes events for condition transitions.
Improved Drain Process Observability: The drain controller now actively updates drain-specific conditions, including immediate reporting of errors during the drain process via a new callback mechanism, offering real-time feedback on drain health.
Comprehensive Design Document: A detailed design document (doc/design/kubernetes-conditions-integration.md) has been added, outlining the motivation, goals, and implementation of this conditions integration across various SR-IOV CRDs.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a robust implementation of Kubernetes conditions for SriovNetworkNodeState, significantly improving its observability. The changes are well-structured, with new helper functions for managing conditions, comprehensive tests, and a detailed design document. The use of Patch for status updates and careful state management correctly handles potential race conditions between the drain controller and the config daemon. A notable improvement is the immediate reporting of drain errors, like PDB violations, by capturing log output from the drain helper, which enhances the responsiveness of the DrainDegraded condition.

gemini-code-assist · 2026-01-11T20:11:16Z

pkg/drain/drainer.go

+		// ErrOut captures errors from the drain library, including retry errors.
+		// We call the onError callback here to report eviction failures immediately,
+		// before the drain timeout is reached.
+		ErrOut: writer{func(msg string, kv ...interface{}) {
+			cleanMsg := strings.ReplaceAll(msg, "\n", "")
+			logger.Error(nil, cleanMsg, kv...)
+			// Call the error callback for eviction/deletion errors
+			if onError != nil && strings.Contains(cleanMsg, "error when") {
+				onError(fmt.Errorf("%s", cleanMsg))
+			}
+		}},


This implementation for capturing drain errors by parsing log output is clever but brittle, as it depends on the specific log message format of the upstream k8s.io/kubectl/pkg/drain library. A change in the upstream logging could break this error detection.

It would be beneficial to add a comment here explaining this dependency and the trade-off made. This will help future maintainers understand why this approach was chosen (e.g., to get immediate feedback on retryable errors like PDB violations, which OnPodDeletionOrEvictionFinished doesn't provide) and to be aware of the potential for breakage if the upstream library is updated.

coveralls · 2026-01-12T11:19:00Z

Pull Request Test Coverage Report for Build 22143114570

Details

392 of 509 (77.01%) changed or added relevant lines in 9 files are covered.
20 unchanged lines in 4 files lost coverage.
Overall coverage increased (+0.3%) to 63.177%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/drain/drainer.go	25	27	92.59%
controllers/drain_controller.go	3	6	50.0%
pkg/status/transitions.go	55	59	93.22%
api/v1/zz_generated.deepcopy.go	6	21	28.57%
controllers/drain_controller_helper.go	38	62	61.29%
pkg/status/patcher.go	34	103	33.01%

Files with Coverage Reduction	New Missed Lines	%
controllers/helper.go	2	70.08%
pkg/utils/cluster.go	3	82.91%
pkg/daemon/status.go	5	79.83%
controllers/drain_controller_helper.go	10	62.06%

Totals
Change from base Build 22102567607:	0.3%
Covered Lines:	9642
Relevant Lines:	15262

💛 - Coveralls

Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Add foundational condition support for SR-IOV Network Operator CRDs: - Define standard condition types (Ready, Progressing, Degraded) - Define drain-specific condition types (DrainProgressing, DrainDegraded, DrainComplete) - Add DrainState enum for tracking drain operation states - Add common condition reasons for various states - Add NetworkStatus type with Conditions field for network CRDs - Add ConditionsEqual helper for comparing conditions ignoring LastTransitionTime - Add SetConfigurationConditions and SetDrainConditions methods - Add comprehensive unit tests for condition handling This follows the Kubernetes API conventions for status conditions and provides the foundation for observability improvements across all SR-IOV CRDs. Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Add a new status package with utilities for managing CRD status updates: Patcher (patcher.go): - Provides retry logic for status updates with conflict handling - Supports both Update and Patch operations - Includes UpdateStatusWithEvents for automatic event emission - Embeds condition management methods for convenience - Uses interface for easy mocking in tests Transitions (transitions.go): - DetectTransitions compares old and new conditions - Returns structured Transition objects for each change - EventType() method returns appropriate event type (Normal/Warning) - EventReason() generates suitable event reason strings - Supports Added, Changed, Removed, and Unchanged transitions Tests: - Comprehensive unit tests for patcher operations - Tests for transition detection logic - Tests for event type and reason generation This package provides a reusable foundation for consistent status handling across all controllers in the operator. Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Update SriovNetworkNodeState status type to include Kubernetes conditions: - Add Conditions field to status - Add printcolumns for Ready, Progressing, Degraded - Add printcolumns for drain conditions (DrainProgress, DrainDegraded, DrainComplete) Conditions follow Kubernetes conventions with patchMergeKey and listType annotations for proper strategic merge patch support. Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Update config daemon status handling to set configuration conditions: - Call SetConfigurationConditions when updating node state sync status - Preserve drain conditions during configuration status updates - Use ConditionsEqual to avoid unnecessary API calls when unchanged - Set Ready=True, Progressing=False, Degraded=False on success - Set Progressing=True, Ready=False on in-progress configuration - Set Degraded=True with error message on configuration failure - Keep Degraded=True while retrying after previous failure This ensures the daemon properly reports configuration progress and errors through standard Kubernetes conditions, enabling better observability and automation. Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Add comprehensive drain condition management to the drain controller: Drain Controller Changes: - Add updateDrainConditions helper to set drain-specific conditions - Set DrainProgressing=True when drain starts - Set DrainDegraded=True when drain errors occur (e.g., PDB violations) - Set DrainComplete=True when drain completes successfully - Reset all drain conditions to idle state when returning to idle - Use Patch instead of Update to avoid race conditions with daemon Drainer Changes: - Add DrainErrorCallback type for immediate error notification - Call callback when eviction errors occur (before timeout) - Capture first real error (e.g., PDB violation) instead of generic timeout - Pass callback through cordon and drain operations Controller Helper Fix: - Fix DrainStateAnnotationPredicate to use GetAnnotations() not GetLabels() Tests: - Add tests for DrainComplete=True after successful drain - Add tests for DrainDegraded=False after successful drain - Add tests for idle state conditions - Add tests for observedGeneration consistency - Add tests for single node reboot drain conditions - Add tests for error callback invocation This enables users to observe drain progress and errors through standard Kubernetes conditions, improving troubleshooting and enabling automated responses to drain failures. Signed-off-by: Sebastian Sch <sebassch@gmail.com>

Update the kubernetes-conditions-integration.md design document to reflect the actual implementation details: - Add SriovNetworkNodePolicy as a supported CRD - Add drain-specific conditions (DrainProgressing, DrainDegraded, DrainComplete) - Document all condition reasons with actual constant names - Add NetworkStatus shared type used by network CRDs - Document pkg/status package with Patcher interface and transition detection - Add ConditionsEqual helper function documentation - Update API extension examples with actual struct definitions - Add matchedNodeCount/readyNodeCount to Policy and PoolConfig status - Add kubectl output examples showing new columns - Update implementation commits list - Mark document status as 'implemented' Signed-off-by: Sebastian Sch <sebassch@gmail.com>

- Add SetConfigurationConditions and SetDrainConditions methods to helper.go - Add findCondition helper function to suite_test.go - Regenerate CRDs and deepcopy for SriovNetworkNodeState type Signed-off-by: Sebastian Sch <sebassch@gmail.com>

The optimization to skip unnecessary API calls in updateSyncState was only checking SyncStatus, LastSyncError, and Conditions. This caused updates to Status.Interfaces, Status.Bridges, and Status.System to be skipped when those fields changed but the other fields remained the same. This broke ExternallyManaged policy validation because when VFs were manually configured on the host, the daemon would discover the new NumVfs value but skip updating the SriovNetworkNodeState status. The webhook would then see stale NumVfs=0 and reject the policy. Add equality checks for Interfaces, Bridges, and System fields to ensure host status changes are properly persisted to the API. Signed-off-by: Cursor AI <noreply@cursor.com>

- sriovoperatorconfig_controller_test: add explicit timeout to the Eventually call that waits for the controller to add finalizers. The default 1s timeout is too tight for CI environments where the controller manager may take longer to start reconciling. - daemon_test: trigger an annotation change after creating a fresh nodeState in BeforeEach to guard against a race where the informer cache misses the initial create event with controller-runtime's priority queue, causing the daemon to never reconcile the object. Signed-off-by: Sebastian Sch <sebassch@gmail.com> Co-authored-by: Cursor <cursoragent@cursor.com>

github-actions bot added docs tests labels Jan 11, 2026

gemini-code-assist bot reviewed Jan 11, 2026

View reviewed changes

SchSeba force-pushed the conditions/nodestate branch from e4e96ca to 230d43c Compare January 12, 2026 11:06

SchSeba mentioned this pull request Jan 14, 2026

Add Kubernetes Conditions to SR-IOV Network Operator CRDs #1007

Closed

SchSeba added 9 commits February 18, 2026 11:50

design proposal for conditions in operator CRDs

dfba803

Signed-off-by: Sebastian Sch <sebassch@gmail.com>

SchSeba force-pushed the conditions/nodestate branch from 230d43c to de1e8d4 Compare February 18, 2026 13:19

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Conditions/nodestate#1012

Conditions/nodestate#1012
SchSeba wants to merge 10 commits intok8snetworkplumbingwg:masterfrom
SchSeba:conditions/nodestate

SchSeba commented Jan 11, 2026

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Jan 11, 2026

Uh oh!

coveralls commented Jan 12, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

SchSeba commented Jan 11, 2026

Uh oh!

github-actions bot commented Jan 11, 2026

Uh oh!

gemini-code-assist bot commented Jan 11, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Jan 11, 2026

Choose a reason for hiding this comment

Uh oh!

coveralls commented Jan 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Pull Request Test Coverage Report for Build 22143114570

Details

💛 - Coveralls

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

coveralls commented Jan 12, 2026 •

edited

Loading