Skip to content

fix: getCurrentBridgeState to return all uplinks for grouped bridges#131

Merged
almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
almaslennikov:fix-bridge
Jan 15, 2026
Merged

fix: getCurrentBridgeState to return all uplinks for grouped bridges#131
almaslennikov merged 1 commit intoMellanox:network-operator-26.1.xfrom
almaslennikov:fix-bridge

Conversation

@almaslennikov
Copy link
Collaborator

The getCurrentBridgeState() function was only processing the first uplink (Uplinks[0]) and ignoring the rest. This caused bridges with multiple uplinks (created via groupingPolicy: all) to constantly reconfigure because the current state comparison always showed a mismatch.

Additionally, NeedToUpdateBridges was comparing the entire Bridges struct including the GroupingPolicy field. However, GroupingPolicy is a policy directive used during configuration and is not stored in OVS, so it will never appear in the discovered status. This caused another source of constant mismatch detection.

Changes:

  • Modify getCurrentBridgeState() to iterate through all uplinks in knownConfig.Uplinks instead of only processing the first one
  • Modify NeedToUpdateBridges to only compare OVS configurations, ignoring the GroupingPolicy field
  • Add test cases for bridge with multiple uplinks
  • Add test case for partial uplinks when some interfaces are missing
  • Add test case for groupingPolicy difference being ignored

Fixes infinite reconciliation loop where bridge ports kept being removed and re-added on every sync cycle.

The getCurrentBridgeState() function was only processing the first
uplink (Uplinks[0]) and ignoring the rest. This caused bridges with
multiple uplinks (created via groupingPolicy: all) to constantly
reconfigure because the current state comparison always showed a
mismatch.

Additionally, NeedToUpdateBridges was comparing the entire Bridges
struct including the GroupingPolicy field. However, GroupingPolicy is
a policy directive used during configuration and is not stored in OVS,
so it will never appear in the discovered status. This caused another
source of constant mismatch detection.

Changes:
- Modify getCurrentBridgeState() to iterate through all uplinks in
  knownConfig.Uplinks instead of only processing the first one
- Modify NeedToUpdateBridges to only compare OVS configurations,
  ignoring the GroupingPolicy field
- Add test cases for bridge with multiple uplinks
- Add test case for partial uplinks when some interfaces are missing
- Add test case for groupingPolicy difference being ignored

Fixes infinite reconciliation loop where bridge ports kept being
removed and re-added on every sync cycle.

Signed-off-by: Alexander Maslennikov <amaslennikov@nvidia.com>
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions github-actions bot added the tests label Jan 15, 2026
@greptile-apps
Copy link

greptile-apps bot commented Jan 15, 2026

Greptile Summary

This PR fixes an infinite reconciliation loop that occurred when using bridge grouping policies with multiple uplinks. The fix addresses two critical issues:

Key Changes:

  • Modified getCurrentBridgeState() in pkg/host/internal/bridge/ovs/ovs.go to iterate through all uplinks instead of only processing Uplinks[0], ensuring complete bridge state is captured
  • Updated NeedToUpdateBridges() in api/v1/helper.go to compare only the OVS configuration fields (bridgeSpec.OVS vs bridgeStatus.OVS) and ignore GroupingPolicy, which is a policy directive not stored in OVS
  • Added comprehensive test coverage for multi-uplink bridges and partial uplink scenarios

Impact:
Eliminates the reconciliation loop where bridge ports were continuously removed and re-added on every sync cycle, significantly improving stability for configurations using groupingPolicy: all with multiple network interfaces.

Confidence Score: 5/5

  • This PR is safe to merge with minimal risk
  • The changes are well-targeted bug fixes with clear logic improvements. The iteration through all uplinks is the correct approach, and the comparison logic fix properly separates policy directives from actual OVS state. Comprehensive test coverage validates both the happy path (all uplinks present) and edge cases (partial uplinks). The changes are backward compatible and don't introduce new features or risky refactoring.
  • No files require special attention

Important Files Changed

Filename Overview
pkg/host/internal/bridge/ovs/ovs.go Fixed getCurrentBridgeState to iterate through all uplinks instead of only processing the first one, preventing reconciliation loops for grouped bridges
api/v1/helper.go Modified NeedToUpdateBridges to compare only OVS configurations, ignoring GroupingPolicy field which doesn't exist in actual OVS state
pkg/host/internal/bridge/ovs/ovs_test.go Added comprehensive test coverage for multiple uplinks and partial uplinks scenarios
api/v1/helper_test.go Added test case verifying that GroupingPolicy differences are correctly ignored in bridge comparison

Sequence Diagram

sequenceDiagram
    participant Operator as SR-IOV Operator
    participant Helper as NeedToUpdateBridges
    participant OVS as OVS Manager
    participant Store as Config Store
    participant OVSDB as OVS Database

    Note over Operator,OVSDB: Bridge Reconciliation Flow (Before Fix)
    Operator->>Helper: Compare spec vs status
    Helper->>Helper: DeepEqual(bridgeSpec, bridgeStatus)<br/>including GroupingPolicy
    Note over Helper: ❌ Always mismatches because<br/>GroupingPolicy not in status
    Helper-->>Operator: Update needed
    Operator->>OVS: CreateOVSBridge(conf)
    OVS->>Store: GetManagedOVSBridge(name)
    Store-->>OVS: knownConfig with all uplinks
    OVS->>OVS: getCurrentBridgeState(knownConfig)
    OVS->>OVSDB: Query uplinks[0] only
    OVSDB-->>OVS: First uplink state
    Note over OVS: ❌ Only returns first uplink<br/>Ignores p1, p2, p3
    OVS-->>OVS: currentState with 1 uplink
    OVS->>OVS: Compare conf vs currentState
    Note over OVS: ❌ Mismatch: 4 uplinks vs 1 uplink
    OVS->>OVSDB: Reconfigure (remove/add ports)
    Note over Operator,OVSDB: Infinite reconciliation loop

    Note over Operator,OVSDB: Bridge Reconciliation Flow (After Fix)
    Operator->>Helper: Compare spec vs status
    Helper->>Helper: DeepEqual(bridgeSpec.OVS, bridgeStatus.OVS)<br/>ignoring GroupingPolicy
    Note over Helper: ✅ Correct comparison of<br/>actual OVS config
    Helper-->>Operator: No update if OVS matches
    Operator->>OVS: CreateOVSBridge(conf)
    OVS->>Store: GetManagedOVSBridge(name)
    Store-->>OVS: knownConfig with all uplinks
    OVS->>OVS: getCurrentBridgeState(knownConfig)
    loop For each uplink in knownConfig
        OVS->>OVSDB: Query uplink[i]
        OVSDB-->>OVS: Uplink state
        OVS->>OVS: Append to currentConfig.Uplinks
    end
    Note over OVS: ✅ Returns all uplinks<br/>(p0, p1, p2, p3)
    OVS-->>OVS: currentState with all uplinks
    OVS->>OVS: Compare conf vs currentState
    Note over OVS: ✅ Match: both have same uplinks
    OVS-->>Operator: No reconfiguration needed
    Note over Operator,OVSDB: Stable state achieved
Loading

@almaslennikov almaslennikov merged commit ef37ee1 into Mellanox:network-operator-26.1.x Jan 15, 2026
11 of 14 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants