Skip to content

feat: Devlink parameters configuration#108

Merged
e0ne merged 6 commits intoMellanox:network-operator-26.1.xfrom
e0ne:devlink-params-impl
Jan 9, 2026
Merged

feat: Devlink parameters configuration#108
e0ne merged 6 commits intoMellanox:network-operator-26.1.xfrom
e0ne:devlink-params-impl

Conversation

@e0ne
Copy link
Collaborator

@e0ne e0ne commented Dec 22, 2025

No description provided.

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@github-actions github-actions bot added the docs label Dec 22, 2025
@e0ne e0ne force-pushed the devlink-params-impl branch 3 times, most recently from bf602d5 to 63c5ab2 Compare December 22, 2025 21:24
@heyvister1 heyvister1 closed this Dec 23, 2025
@heyvister1 heyvister1 reopened this Dec 23, 2025
@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@e0ne e0ne changed the base branch from master to network-operator-26.1.x December 31, 2025 09:28
@e0ne e0ne force-pushed the devlink-params-impl branch from 71c1b13 to 9260b97 Compare December 31, 2025 09:29
@greptile-apps
Copy link

greptile-apps bot commented Dec 31, 2025

Greptile Summary

This PR implements devlink parameters configuration support for SR-IOV Network Operator, allowing users to configure device-level parameters through the SriovNetworkNodePolicy API.

Key Changes:

  • Adds DevlinkParams struct to policy spec with fields for parameter name, value, cmode (runtime/driverinit/permanent), and applyOn (PF/VF/SF)
  • Implements discovery of current devlink parameters during device scanning via GetDevlinkDeviceParams()
  • Applies devlink parameters through SetDevlinkDeviceParam() for both PF and VF devices
  • Mellanox vendor plugin handles esw_multiport parameter by setting firmware LAG_RESOURCE_ALLOCATION flag
  • Adds bridge grouping policy support with "all" option to group multiple interfaces under single OVS bridge
  • Comprehensive test coverage including unit tests for eswitch params and devlink discovery

Issues Found:

  • Missing error return in applyDevlinkVfParam() when GetVFList() fails (line 471-474)
  • Case-sensitive comparison of ApplyOn field doesn't handle lowercase values allowed by API (line 446-461)
  • Minor: incorrect function name in log message

Confidence Score: 3/5

  • PR has solid implementation and tests but contains two logic bugs that could cause runtime failures
  • Score reflects good architectural design and comprehensive test coverage, but reduced due to critical bugs: (1) missing error return causes silent failures when getting VF list, (2) case-sensitive ApplyOn comparison breaks API contract allowing lowercase values
  • pkg/host/internal/sriov/sriov.go requires fixes for error handling and case-insensitive field comparison

Important Files Changed

Filename Overview
api/v1/sriovnetworknodepolicy_types.go Adds DevlinkParams API to policy spec and GroupingPolicy enum for bridge configuration
api/v1/sriovnetworknodestate_types.go Extends node state with devlink params for interfaces, VFs, and bridge grouping policy
pkg/vendors/mellanox/mellanox.go Implements vendor-specific esw_multiport devlink param handling with firmware configuration logic
pkg/host/internal/network/network.go Adds GetDevlinkDeviceParams discovery and helper functions for devlink type/cmode conversion
pkg/host/internal/sriov/sriov.go Implements devlink param discovery and application logic for PF and VF devices
pkg/plugins/generic/generic_plugin.go Adds bridge grouping policy configuration call in Apply method
pkg/plugins/mellanox/mellanox_plugin.go Integrates HandleESwitchParams call for eswitch devlink parameter handling
pkg/host/internal/bridge/bridge.go Implements ConfigureBridgesGrouping for OVS bridge grouping with "all" policy support

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Controller as Operator Controller
    participant VendorPlugin as Mellanox Plugin
    participant GenericPlugin as Generic Plugin
    participant HostHelper as Host Manager
    participant Devlink as Devlink/Netlink
    participant Firmware as NIC Firmware (mstconfig)

    User->>Policy: Create/Update Policy with DevlinkParams
    Policy->>Controller: Apply Policy to Node State
    Controller->>VendorPlugin: OnNodeStateChange()
    
    VendorPlugin->>VendorPlugin: HandleESwitchParams(esw_multiport)
    alt esw_multiport requires change
        VendorPlugin->>VendorPlugin: Set Multiport=1 in firmware attrs
        VendorPlugin->>Firmware: Configure LAG_RESOURCE_ALLOCATION=1
        VendorPlugin-->>Controller: needReboot=true
    end
    
    Controller->>GenericPlugin: Apply()
    GenericPlugin->>HostHelper: DiscoverSriovDevices()
    HostHelper->>Devlink: GetDevlinkDeviceParams(pciAddr)
    Devlink-->>HostHelper: Current devlink params
    HostHelper-->>GenericPlugin: Interface state with DevlinkParams
    
    GenericPlugin->>HostHelper: ConfigureSriovPFDevice()
    loop For each DevlinkParam
        alt ApplyOn == PF
            HostHelper->>Devlink: SetDevlinkDeviceParam(PF, name, value)
        else ApplyOn == VF
            loop For each VF
                HostHelper->>Devlink: SetDevlinkDeviceParam(VF, name, value)
            end
        end
    end
    
    alt Bridge GroupingPolicy configured
        GenericPlugin->>HostHelper: ConfigureBridgesGrouping()
        HostHelper->>HostHelper: Create OVS bridge with first interface
        loop For remaining interfaces
            HostHelper->>HostHelper: Add interface to bridge
        end
    end
    
    GenericPlugin-->>Controller: Configuration applied
    Controller-->>User: Node state updated
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. pkg/host/internal/sriov/sriov.go, line 471-474 (link)

    logic: error from GetVFList is logged but not returned, yet the function continues to iterate over potentially empty vfAddrs slice

  2. api/v1/sriovnetworknodepolicy_types.go, line 152-154 (link)

    logic: case-sensitivity for ApplyOn validation - enum values include both uppercase and lowercase (PF/pf, VF/vf, SF/sf) but code only checks against uppercase constants. should the code normalize the ApplyOn field to uppercase before comparison, or is case-insensitive matching intended?

  3. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    logic: devlink parameters are applied AFTER VFs are created, but some parameters like esw_multiport may require firmware configuration before VF creation according to the design doc. is the ordering correct, or should vendor-specific firmware config (like Mellanox LAG_RESOURCE_ALLOCATION) happen before this generic devlink param application?

  4. pkg/host/internal/sriov/sriov.go, line 447 (link)

    logic: missing handling for SF (Scalable Function) ApplyOn option - only PF and VF cases are handled

29 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@nvidia-ci-cd nvidia-ci-cd force-pushed the network-operator-26.1.x branch from 8b9a8d8 to 1850188 Compare January 3, 2026 00:10
@e0ne e0ne force-pushed the devlink-params-impl branch from 9260b97 to a0eb40b Compare January 5, 2026 09:06
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. pkg/host/internal/sriov/sriov.go, line 472-474 (link)

    syntax: param argument not included in log message

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

  2. pkg/host/internal/sriov/sriov.go, line 479 (link)

    syntax: param.Name not included in log message

  3. pkg/host/internal/network/network.go, line 427-433 (link)

    logic: Error variable err is checked but never set

    Was type assertion meant to be used here, or was devlinkValueToString supposed to be called?

  4. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    style: switch statement missing default case - params with empty or invalid ApplyOn fields will be silently ignored

29 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

@e0ne e0ne force-pushed the devlink-params-impl branch from a0eb40b to e7879fc Compare January 5, 2026 12:39
@e0ne e0ne removed the docs label Jan 5, 2026
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (6)

  1. pkg/host/internal/bridge/bridge.go, line 96-98 (link)

    logic: Interfaces slice could be empty even after the check on line 96, causing panic on line 100 when accessing interfaces[0]. The check validates len(interfaces) == 0 but this happens inside the groupingPolicy != "" block. If called with empty interfaces and empty groupingPolicy, the function returns nil without error, but with non-empty groupingPolicy it would fail.

  2. pkg/host/internal/sriov/sriov.go, line 332-337 (link)

    logic: Error getting devlink parameters causes the entire PF to be skipped with continue. This means if devlink fails for any reason (device doesn't support it, permission issues, etc.), the PF won't be included in the discovered devices list at all. Consider logging a warning but continuing without devlink params instead.

  3. pkg/host/internal/sriov/sriov.go, line 471-474 (link)

    logic: Error from GetVFList() is assigned but the function continues to iterate over potentially nil/empty vfAddrs slice. This will silently fail to apply devlink params to VFs without indication of the failure.

  4. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    logic: Missing default case handling for ApplyOn field. If param.ApplyOn is empty string, SF, or lowercase variants (pf, vf), the devlink parameter will be silently skipped. According to the API definition in sriovnetworknodepolicy_types.go:153, lowercase variants are allowed by the enum validator but not handled here.

  5. pkg/vendors/mellanox/mellanox.go, line 446-469 (link)

    style: Logic only checks the first port (pciAddress + "0") for esw_multiport parameter. For multi-port NICs, this may not correctly handle configurations where different ports have different devlink parameter requirements.

  6. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    style: Use constants from pkg/consts package for comparing ApplyOn values instead of hardcoded strings, and handle case-insensitive comparison since API allows both uppercase and lowercase.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

33 files reviewed, 6 comments

Edit Code Review Agent Settings | Greptile

EswitchMode: p.Spec.EswitchMode,
NumVfs: p.Spec.NumVfs,
ExternallyManaged: p.Spec.ExternallyManaged,
DevlinkParams: p.Spec.DevlinkParams,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regarding the bridge multiport, don't we need to update ApplyBridgeConfig to build the spec?

Copy link
Member

@rollandf rollandf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DevlinkParams feature looks good.
Regarding the multiport, does the bridge spec need to include all the interfaces?
Will the sync comparing the spec to actual work?

Can you provide an SRIOVNodeState example with multiport?

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (4)

  1. pkg/host/internal/sriov/sriov.go, line 471-474 (link)

    logic: error not returned after logging

  2. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    logic: case-insensitive matching not handled - API allows both uppercase and lowercase (PF, pf, VF, vf, SF, sf) but switch only checks uppercase constants

    Consider normalizing param.ApplyOn with strings.ToUpper() before the switch, or add lowercase cases

  3. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    logic: missing default case and empty ApplyOn handling

    The API comment says "PF (default)" but there's no default case when ApplyOn is empty or contains SF. Add a default case to handle unrecognized values

  4. api/v1/sriovnetworknodepolicy_types.go, line 149-151 (link)

    style: comment says "runtime is only supported now" but code doesn't validate or enforce this - all three cmodes (runtime, driverinit, permanent) are allowed by the enum

35 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (3)

  1. pkg/host/internal/sriov/sriov.go, line 471-474 (link)

    logic: Missing error return after GetVFList fails. If getting VF list fails, should return error immediately rather than continuing with empty list.

  2. pkg/host/internal/sriov/sriov.go, line 446-461 (link)

    logic: Case-sensitive comparison of ApplyOn field. API allows PF, pf, VF, vf, SF, sf but switch only checks uppercase constants.

  3. pkg/host/internal/bridge/bridge.go, line 93 (link)

    style: Log message says DiscoverBridges() but function name is ConfigureBridgesGrouping()

35 files reviewed, 3 comments

Edit Code Review Agent Settings | Greptile

@nvidia-ci-cd nvidia-ci-cd force-pushed the network-operator-26.1.x branch from 1850188 to 832b8bd Compare January 7, 2026 00:09
@e0ne e0ne force-pushed the devlink-params-impl branch from 64d5f6e to 020f09b Compare January 7, 2026 09:20
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds devlink parameter configuration support to the SR-IOV Network Operator, allowing users to configure low-level device parameters for NIC devices.

Key Changes

  • Added DevlinkParams field to SriovNetworkNodePolicy API with validation for parameter names, values, cmodes (runtime/driverinit/permanent), and apply targets (PF/VF/SF)
  • Implemented devlink parameter discovery via GetDevlinkDeviceParams() to read current device parameters and populate node state status
  • Added SetDevlinkDeviceParam() with automatic type conversion (uint8/uint16/uint32/string/bool) and CMODE detection
  • Integrated vendor-specific handling in Mellanox plugin for esw_multiport parameter requiring firmware LAG_RESOURCE_ALLOCATION configuration
  • Parameters applied during PF configuration phase before VF creation, with separate handling for PF-level and VF-level parameters

Critical Issues Found

  • Case-sensitivity bug: The ApplyOn field allows both uppercase and lowercase values in the API enum, but the implementation only matches uppercase constants, causing lowercase values to be silently ignored
  • Missing SF support: SubFunction (SF) is allowed by the API but not implemented in the switch statement
  • No default case: Invalid ApplyOn values are silently ignored without error reporting

Architecture

The implementation follows the operator's plugin architecture: GenericPlugin applies OS-level devlink parameters via netlink, while vendor plugins (e.g., MellanoxPlugin) handle firmware-level prerequisites before devlink configuration.

Confidence Score: 3/5

  • This PR has critical bugs that will cause devlink parameters with lowercase ApplyOn values to be silently ignored
  • The case-sensitivity bug in ApplyOn field comparison is a serious functional issue that will prevent the feature from working correctly when users provide lowercase values (which are explicitly allowed by the API validation). Additionally, SF (SubFunction) support is missing despite being in the API enum, and there's no error handling for invalid ApplyOn values.
  • Pay close attention to pkg/host/internal/sriov/sriov.go - the ApplyOn field comparison logic must be fixed to handle case-insensitive matching and add SF support

Important Files Changed

File Analysis

Filename Score Overview
api/v1/sriovnetworknodepolicy_types.go 5/5 Added DevlinkParams field to policy spec with proper validation enums
api/v1/sriovnetworknodestate_types.go 5/5 Added DevlinkParams fields to Interface, InterfaceExt, and VirtualFunction types for tracking devlink parameters
pkg/host/internal/sriov/sriov.go 3/5 Implemented devlink parameter discovery and application; case-sensitivity issue with ApplyOn field comparison
pkg/host/internal/network/network.go 5/5 Implemented GetDevlinkDeviceParams and SetDevlinkDeviceParam with proper type conversion and CMODE handling
pkg/vendors/mellanox/mellanox.go 5/5 Added vendor-specific handling for esw_multiport devlink parameter requiring firmware configuration

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Daemon as Config Daemon
    participant MP as MellanoxPlugin
    participant GP as GenericPlugin
    participant Sriov as SriovInterface
    participant Network as NetworkHelper
    participant Netlink as Netlink Library
    participant FW as Firmware (mstconfig)

    User->>Policy: Create/Update with DevlinkParams
    Policy->>Daemon: Policy Applied to Node
    Daemon->>MP: OnNodeStateChange()
    MP->>MP: Check esw_multiport param
    alt esw_multiport detected
        MP->>FW: Set LAG_RESOURCE_ALLOCATION=1
        MP->>FW: Reset firmware (if needed)
    end
    MP-->>Daemon: needReboot, needDrain
    
    Daemon->>GP: Apply()
    GP->>Sriov: ConfigSriovInterfaces()
    Sriov->>Sriov: configSriovPFDevice()
    
    loop for each DevlinkParam
        alt ApplyOn == PF
            Sriov->>Network: SetDevlinkDeviceParam(pciAddr, name, value)
            Network->>Netlink: DevlinkGetDeviceParamByName()
            Netlink-->>Network: param metadata (type, cmode)
            Network->>Network: Convert value to correct type
            Network->>Netlink: DevlinkSetDeviceParam(bus, device, param, cmode, value)
            Netlink-->>Network: success
        else ApplyOn == VF
            Sriov->>Sriov: Get VF list
            loop for each VF
                Sriov->>Network: SetDevlinkDeviceParam(vfAddr, name, value)
                Network->>Netlink: DevlinkSetDeviceParam()
            end
        end
    end
    
    Sriov->>Sriov: Create VFs
    
    Daemon->>Sriov: DiscoverSriovDevices()
    Sriov->>Network: GetDevlinkDeviceParams(pciAddr)
    Network->>Netlink: DevlinkGetDeviceParams()
    Netlink-->>Network: All configured params
    Network->>Network: Merge params by name and cmode
    Network-->>Sriov: DevlinkParam list
    Sriov-->>Daemon: Updated InterfaceExt with DevlinkParams
    Daemon->>Policy: Update Status
Loading

@greptile-apps
Copy link

greptile-apps bot commented Jan 7, 2026

Additional Comments (3)

pkg/host/internal/sriov/sriov.go
case-insensitive comparison needed for ApplyOn field

The API validation allows both uppercase and lowercase values (PF;pf;VF;vf;SF;sf), but the switch statement only matches uppercase constants. Users providing lowercase values will have their devlink params silently ignored.

	for _, param := range iface.DevlinkParams.Params {
		applyOn := strings.ToUpper(param.ApplyOn)
		switch applyOn {
		case consts.DevlinkParamApplyOnPf:
			err = s.applyDevlinkPfParam(iface, param)
			if err != nil {
				log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
				return err
			}
		case consts.DevlinkParamApplyOnVf:
			err = s.applyDevlinkVfParam(iface, param)
			if err != nil {
				log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
				return err
			}
		}
	}

pkg/host/internal/sriov/sriov.go
missing default case and SF handling

The switch statement lacks a default case for invalid ApplyOn values and doesn't handle SF (SubFunction) which is allowed by the API validation enum.


pkg/host/internal/sriov/sriov.go
error handling causes device to be skipped

When GetDevlinkDeviceParams fails, the entire device is skipped with continue. This means a single devlink error prevents the device from being discovered at all, which may be too strict.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

@nvidia-ci-cd nvidia-ci-cd force-pushed the network-operator-26.1.x branch from 832b8bd to 01f2b6c Compare January 8, 2026 00:11
@e0ne e0ne force-pushed the devlink-params-impl branch from 020f09b to f509543 Compare January 8, 2026 06:36
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds devlink parameter configuration support to SR-IOV Network Operator, enabling runtime configuration of device-level parameters like esw_multiport for Mellanox NICs. The implementation extends the API with DevlinkParams fields in policies and state objects, discovers existing parameters via netlink, and applies them through vendor-specific (Mellanox firmware) and generic (OS-level devlink) plugins. Bridge grouping policy configuration is also added to support multi-port NIC management.

Confidence Score: 2/5

  • Moderate risk due to error handling gaps that could cause silent failures during devlink parameter configuration
  • The PR contains a critical logic error in applyDevlinkVfParam where GetVFList failure is logged but not returned, allowing execution to continue with no VFs configured. Additionally, missing default handling for the ApplyOn field could result in devlink parameters being silently ignored. While the overall architecture is sound and the Mellanox-specific firmware integration appears correct, these error handling issues in core SR-IOV configuration paths pose risk of configuration drift where parameters appear applied but aren't actually set on devices.
  • pkg/host/internal/sriov/sriov.go requires immediate attention for error handling in VF parameter application; pkg/vendors/mellanox/mellanox.go needs input validation for devlink parameter values

Important Files Changed

File Analysis

Filename Score Overview
pkg/host/internal/sriov/sriov.go 1/5 Adds devlink parameter configuration for PF/VF; missing error return after logging error in applyDevlinkVfParam could cause silent failures
pkg/vendors/mellanox/mellanox.go 2/5 Adds esw_multiport devlink parameter support via firmware LAG_RESOURCE_ALLOCATION configuration; logic seems sound but lacks validation for devlink parameter values
api/v1/sriovnetworknodepolicy_types.go 3/5 Adds DevlinkParams struct with validation enums for Cmode and ApplyOn fields; also adds GroupingPolicy field to Bridge struct
pkg/host/internal/network/network.go 3/5 Refactors devlink value parsing into reusable helpers and adds GetDevlinkDeviceParams to retrieve all configured parameters

Sequence Diagram

sequenceDiagram
    participant Policy as SriovNetworkNodePolicy
    participant Generic as GenericPlugin
    participant Mellanox as MellanoxPlugin
    participant SRIOV as SR-IOV Manager
    participant Network as Network Helper
    participant FW as Firmware (mstconfig)

    Policy->>Generic: Apply Policy with DevlinkParams
    Generic->>SRIOV: ConfigureSriovDevice()
    
    alt Mellanox NIC with esw_multiport
        Mellanox->>Mellanox: HandleESwitchParams()
        Mellanox->>FW: Set LAG_RESOURCE_ALLOCATION
        Mellanox-->>Generic: Requires Reboot
    end
    
    SRIOV->>Network: GetDevlinkDeviceParams() (discover)
    Network-->>SRIOV: Current params
    
    loop For each DevlinkParam
        alt ApplyOn = PF
            SRIOV->>Network: SetDevlinkDeviceParam(PF)
        else ApplyOn = VF
            SRIOV->>Network: SetDevlinkDeviceParam(each VF)
        end
    end
    
    Generic->>Generic: ConfigureBridges()
    Generic->>Generic: ConfigureBridgesGrouping()
Loading

Comment on lines 470 to 474
func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing error return after logging error in applyDevlinkVfParam could cause silent failures. If GetVFList fails, the error is logged but execution continues with a potentially empty vfAddrs slice. This means devlink parameters won't be applied to any VFs, but the function returns success.

Suggested change
func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
}
func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
return err
}

Comment on lines +446 to +461
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing default value or validation for param.ApplyOn could lead to devlink parameters being silently ignored. When ApplyOn is empty or has an unexpected value (e.g., "SF"), the parameter won't be applied but no error is returned.

Consider adding a default case or validation to handle empty/invalid ApplyOn values. According to the design doc, PF should be the default.

Comment on lines 450 to 474
func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) {
log.Log.Info("mellanox-plugin isLagResourceAllocationRequireChange()", "device", iface.PciAddress)

requsted, found := false, false
for _, devlinkParam := range iface.DevlinkParams.Params {
if devlinkParam.Name != devlinkMultiportParamName {
continue
}
requsted = true

for _, devlinkParamStatus := range ifaceStatus.DevlinkParams.Params {
if devlinkParamStatus.Name == devlinkMultiportParamName {
found = true
if iface.Name == ifaceStatus.Name && devlinkParam.Value != devlinkParamStatus.Value {
return true, &devlinkParam
}
}
}
}
if requsted && !found {
// esw_multiport devlink parameter not found and not configured
return true, nil
}
return false, nil
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function isESwitchParamsRequireChange only checks the esw_multiport parameter but doesn't validate the parameter value. Invalid values (e.g., arbitrary strings) could be passed through to firmware configuration without validation.

Consider adding validation to ensure devlinkParam.Value is a valid boolean string ("true"/"false") before using it to set firmware configuration.

Comment on lines +394 to +404
// GetDevlinkDeviceParams returns all configured devlink parameters
func (n *network) GetDevlinkDeviceParams(pciAddr string) ([]sriovnetworkv1.DevlinkParam, error) {
params, err := n.netlinkLib.DevlinkGetDeviceParams(consts.BusPci, pciAddr)
if err != nil {
log.Log.Error(err, "Failed to get devlink params", "device", pciAddr)
return nil, err
}

merged := make(map[string]map[uint8]interface{})
devlinkParams := make([]sriovnetworkv1.DevlinkParam, 0)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In GetDevlinkDeviceParams, the error from devlinkValueToString is captured in the loop but then immediately returned, causing the function to fail on the first unparseable parameter. This could prevent returning valid parameters if one parameter has parsing issues.

However, since the error variable is captured but not actually used before the return statement (line 391-392 show error is assigned but always nil before the check), this appears to be dead code. Consider removing the unused error handling or implementing proper error accumulation.

@e0ne e0ne force-pushed the devlink-params-impl branch from f509543 to 1b542bf Compare January 8, 2026 06:44
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

Adds devlink parameter configuration support to SR-IOV Network Operator, enabling users to configure device-level parameters like esw_multiport through the SriovNetworkNodePolicy CRD. Implements vendor-specific firmware configuration for Mellanox NICs to set LAG_RESOURCE_ALLOCATION when multiport is requested, and generic OS-level devlink parameter application for PF and VF devices.

Confidence Score: 1/5

  • Critical error handling bugs could cause silent failures or exclude devices from SR-IOV configuration
  • Two critical logic errors in error handling: (1) applyDevlinkVfParam logs but doesn't return error when GetVFList fails, causing silent success on failure; (2) DiscoverSriovDevices skips entire PF interface when devlink param discovery fails, potentially breaking SR-IOV on affected devices. Additionally, missing SF (Sub-Function) case means parameters for SF devices are silently ignored despite API validation allowing them.
  • pkg/host/internal/sriov/sriov.go requires immediate fixes to error handling in applyDevlinkVfParam (line 470-474) and DiscoverSriovDevices (line 332-338)

Important Files Changed

File Analysis

Filename Score Overview
pkg/host/internal/sriov/sriov.go 1/5 Adds devlink parameter discovery and application; critical error handling bug in applyDevlinkVfParam doesn't return after GetVFList failure
pkg/host/internal/network/network.go 3/5 Adds GetDevlinkDeviceParams to retrieve all devlink parameters and refactors value conversion into helper methods
pkg/vendors/mellanox/mellanox.go 3/5 Adds HandleESwitchParams to configure LAG_RESOURCE_ALLOCATION firmware parameter based on esw_multiport devlink parameter
api/v1/sriovnetworknodepolicy_types.go 4/5 Adds DevlinkParam and DevlinkParams types to API with validation enums for cmode and applyOn fields

Sequence Diagram

sequenceDiagram
    participant Policy as SriovNetworkNodePolicy
    participant Controller as Policy Controller
    participant State as SriovNetworkNodeState
    participant GenericPlugin as Generic Plugin
    participant MellanoxPlugin as Mellanox Plugin
    participant MlxVendor as Mellanox Vendor Utils
    participant SriovLib as SR-IOV Library
    participant NetworkLib as Network Library
    
    Policy->>Controller: DevlinkParams specified in policy
    Controller->>State: Apply policy to node state
    State->>State: Interface.DevlinkParams = Policy.DevlinkParams
    
    Note over MellanoxPlugin,MlxVendor: Firmware Configuration (Vendor-Specific)
    MellanoxPlugin->>MlxVendor: HandleESwitchParams()
    MlxVendor->>MlxVendor: Check for esw_multiport param
    alt esw_multiport=true and requires change
        MlxVendor->>MlxVendor: Set Multiport=1 (LAG_RESOURCE_ALLOCATION=1)
        MlxVendor-->>MellanoxPlugin: needReboot=true
        MellanoxPlugin->>MlxVendor: MlxConfigFW()
        MlxVendor->>MlxVendor: Run mstconfig to set LAG_RESOURCE_ALLOCATION
    end
    
    Note over GenericPlugin,NetworkLib: OS-Level Devlink Configuration
    GenericPlugin->>SriovLib: ConfigureSriovDevices()
    SriovLib->>SriovLib: configSriovPFDevice()
    
    loop For each DevlinkParam
        alt ApplyOn == "PF"
            SriovLib->>NetworkLib: SetDevlinkDeviceParam(PF address)
            NetworkLib->>NetworkLib: Convert value to correct type
            NetworkLib->>NetworkLib: DevlinkSetDeviceParam()
        else ApplyOn == "VF"
            SriovLib->>SriovLib: GetVFList(PF address)
            loop For each VF
                SriovLib->>NetworkLib: SetDevlinkDeviceParam(VF address)
                NetworkLib->>NetworkLib: DevlinkSetDeviceParam()
            end
        end
    end
    
    Note over SriovLib,NetworkLib: Discovery Phase
    SriovLib->>NetworkLib: GetDevlinkDeviceParams(PCI address)
    NetworkLib->>NetworkLib: DevlinkGetDeviceParams()
    NetworkLib->>NetworkLib: Convert all params to DevlinkParam structs
    NetworkLib-->>SriovLib: []DevlinkParam
    SriovLib->>State: Update Interface.DevlinkParams in status
Loading

Comment on lines 470 to 474
func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Error handling bug: when GetVFList fails, the error is logged but not returned, allowing execution to continue with an empty vfAddrs slice. This means the function silently succeeds when it should fail.

Suggested change
func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
}
func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
return err
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Comment on lines 332 to 338
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
continue
}
iface.DevlinkParams.Params = devlinkParams
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The continue statement skips adding the entire interface to the results when devlink parameter discovery fails. This means the PF and all its VFs are excluded from the node state, which could break SR-IOV functionality. Consider logging the error but continuing to add the interface with empty devlink params.

Suggested change
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
continue
}
iface.DevlinkParams.Params = devlinkParams
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable to read devlink parameters, continuing without them",
"device", device)
} else {
iface.DevlinkParams.Params = devlinkParams
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

}

func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) {
log.Log.Info("mellanox-plugin isLagResourceAllocationRequireChange()", "device", iface.PciAddress)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function name in the log message is incorrect—it says isLagResourceAllocationRequireChange() but the actual function name is isESwitchParamsRequireChange(). This will make debugging harder when reading logs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

Comment on lines +446 to +461
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing case for consts.DevlinkParamApplyOnSf (SF devices). The API defines three ApplyOn values (PF, VF, SF) per the validation enum in sriovnetworknodepolicy_types.go, but only PF and VF cases are handled. Parameters intended for SF devices will be silently ignored.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need at the moment

Comment on lines +446 to +461
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Empty ApplyOn field handling: when param.ApplyOn is empty or contains an invalid value, the parameter is silently skipped without any log message. Consider adding a default case that logs unhandled values or treats empty ApplyOn as PF (since the API comment says "PF (default)").

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An empty value can't be passed via API

@e0ne
Copy link
Collaborator Author

e0ne commented Jan 8, 2026

here's it:

apiVersion: sriovnetwork.openshift.io/v1
kind: SriovNetworkNodeState
metadata:
  annotations:
    sriovnetwork.openshift.io/current-state: DrainComplete
    sriovnetwork.openshift.io/desired-state: Reboot_Required
  creationTimestamp: "2025-12-31T15:34:23Z"
  generation: 4
  name: cloud-dev-13
  namespace: network-operator
  ownerReferences:
  - apiVersion: sriovnetwork.openshift.io/v1
    blockOwnerDeletion: true
    controller: true
    kind: SriovOperatorConfig
    name: default
    uid: 54a1fb27-5a8f-4ffd-b347-625d74395e81
  resourceVersion: "5136730"
  uid: f4ee416f-c62a-41ce-badf-a943e0f7f3de
spec:
  bridges: {}
  interfaces:
  - devlinkParams:
      params:
      - applyOn: PF
        cmode: runtime
        name: esw_multiport
        value: "true"
    eSwitchMode: switchdev
    linkType: ETH
    mtu: 9216
    name: enp216s0f0np0
    numVfs: 1
    pciAddress: 0000:d8:00.0
    vfGroups:
    - deviceType: netdevice
      isRdma: true
      mtu: 9216
      policyName: rail-1
      resourceName: rail-1
      vfRange: 0-0
  - devlinkParams:
      params:
      - applyOn: PF
        cmode: runtime
        name: esw_multiport
        value: "true"
    eSwitchMode: switchdev
    linkType: ETH
    mtu: 9216
    name: enp216s0f1np1
    numVfs: 1
    pciAddress: 0000:d8:00.1
    vfGroups:
    - deviceType: netdevice
      isRdma: true
      mtu: 9216
      policyName: rail-1
      resourceName: rail-1
      vfRange: 0-0
  system:
    ovsConfig:
      doca-init: "true"
      hw-offload: "true"
      hw-offload-ct-size: "0"
status:
  interfaces:
  - deviceID: a2d6
    devlinkParams: {}
    driver: mlx5_core
    eSwitchMode: legacy
    linkAdminState: down
    linkType: ETH
    mac: b8:ce:f6:8d:c8:8c
    mtu: 1500
    name: enp59s0f0np0
    pciAddress: 0000:3b:00.0
    totalvfs: 8
    vendor: 15b3
  - deviceID: a2d6
    devlinkParams: {}
    driver: mlx5_core
    eSwitchMode: legacy
    linkAdminState: down
    linkType: ETH
    mac: b8:ce:f6:8d:c8:8d
    mtu: 1500
    name: enp59s0f1np1
    pciAddress: 0000:3b:00.1
    totalvfs: 8
    vendor: 15b3
  - deviceID: 101d
    devlinkParams: {}
    driver: mlx5_core
    eSwitchMode: legacy
    linkAdminState: down
    linkType: ETH
    mac: 0c:42:a1:80:e8:e2
    mtu: 1500
    name: enp216s0f0np0
    pciAddress: 0000:d8:00.0
    totalvfs: 1
    vendor: 15b3
  - deviceID: 101d
    devlinkParams: {}
    driver: mlx5_core
    eSwitchMode: legacy
    linkAdminState: down
    linkType: ETH
    mac: 0c:42:a1:80:e8:e3
    mtu: 1500
    name: enp216s0f1np1
    pciAddress: 0000:d8:00.1
    totalvfs: 1
    vendor: 15b3
  syncStatus: succeeded
  system:
    ovsConfig:
      hw-offload: "true"
    rdmaMode: shared
    ```

@e0ne e0ne force-pushed the devlink-params-impl branch from 1b542bf to c46becb Compare January 8, 2026 14:06
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds devlink parameter configuration support to the SR-IOV Network Operator, enabling low-level device parameter management for NICs. The implementation follows a two-phase approach: vendor-specific firmware configuration (Mellanox plugin) followed by OS-level devlink parameter application (Generic plugin).

Key Changes:

  • Added DevlinkParams API to SriovNetworkNodePolicy and SriovNetworkNodeState with proper kubebuilder validation for Cmode (runtime/driverinit/permanent) and ApplyOn (PF/VF/SF) fields
  • Implemented devlink parameter discovery via GetDevlinkDeviceParams() to read current device state
  • Added SetDevlinkDeviceParam() application logic for PF and VF devices in the SR-IOV library
  • Mellanox plugin handles esw_multiport parameter by setting LAG_RESOURCE_ALLOCATION firmware flag and triggering reboot
  • Bridge grouping feature added with GroupingPolicy field supporting "all" policy to group multiple interfaces on a single OVS bridge
  • Comprehensive test coverage for Mellanox eswitch parameter handling (10 test cases)

Issues Found:

  • Critical logic error in network.go:429-432 where err is checked after type assertion but never assigned from conversion
  • Error silently discarded at network.go:412 in devlinkValueToString() call could mask conversion failures
  • ApplyOn field handling in sriov.go:446-461 only matches exact constant values (PF/VF), won't handle empty strings or lowercase variants
  • Variable name typo requsted instead of requested in mellanox.go (3 occurrences)
  • Incorrect function name in log message at bridge.go:93

Confidence Score: 3/5

  • This PR has critical logic errors that need fixing before merge, particularly error handling in devlink parameter conversion
  • Score reflects solid architecture and test coverage but critical bugs in error handling (unused error check, silently discarded errors) and potential runtime issues with ApplyOn field matching that could cause devlink parameters to not be applied correctly
  • Pay close attention to pkg/host/internal/network/network.go (error handling bugs), pkg/host/internal/sriov/sriov.go (ApplyOn field logic), and pkg/vendors/mellanox/mellanox.go (typo needs fixing)

Important Files Changed

File Analysis

Filename Score Overview
api/v1/sriovnetworknodepolicy_types.go 5/5 Added DevlinkParam and DevlinkParams types to API with proper kubebuilder validation for Cmode and ApplyOn fields
api/v1/sriovnetworknodestate_types.go 5/5 Extended Interface, InterfaceExt, VirtualFunction, and Bridges types with DevlinkParams and GroupingPolicy fields
pkg/host/internal/network/network.go 2/5 Added GetDevlinkDeviceParams with helper functions; has unused error check at line 430 and silently discarded error at line 412
pkg/host/internal/sriov/sriov.go 3/5 Implemented devlink parameter discovery and application for PF/VF; ApplyOn field matching may not handle default/lowercase values
pkg/vendors/mellanox/mellanox.go 3/5 Added esw_multiport handling with LAG_RESOURCE_ALLOCATION firmware configuration; contains typo 'requsted' instead of 'requested'
pkg/host/internal/bridge/bridge.go 4/5 Added ConfigureBridgesGrouping for 'all' policy to group interfaces on single OVS bridge; has incorrect function name in log

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Helper as API Helper
    participant State as SriovNetworkNodeState
    participant GenericPlugin as Generic Plugin
    participant MellanoxPlugin as Mellanox Plugin
    participant MellanoxVendor as Mellanox Vendor
    participant SriovLib as SR-IOV Library
    participant NetworkLib as Network Library
    participant DevlinkAPI as Devlink API

    User->>Policy: Create/Update Policy with DevlinkParams
    Policy->>Helper: Apply() policy to state
    Helper->>State: Add DevlinkParams to Interface spec
    
    Note over MellanoxPlugin: Vendor-specific preprocessing
    State->>MellanoxPlugin: OnNodeStateChange()
    MellanoxPlugin->>MellanoxVendor: HandleESwitchParams()
    MellanoxVendor->>MellanoxVendor: Check if esw_multiport requested
    alt esw_multiport parameter changed
        MellanoxVendor->>MellanoxVendor: Set Multiport=1/0 for LAG_RESOURCE_ALLOCATION
        MellanoxVendor->>MellanoxVendor: MlxConfigFW() - Configure firmware
        MellanoxVendor-->>MellanoxPlugin: needReboot=true
    end
    
    Note over GenericPlugin: OS-level configuration
    State->>GenericPlugin: Apply()
    GenericPlugin->>SriovLib: ConfigureSriovDevices()
    SriovLib->>SriovLib: configSriovPFDevice()
    
    loop For each DevlinkParam
        alt ApplyOn == "PF"
            SriovLib->>NetworkLib: SetDevlinkDeviceParam(PF)
            NetworkLib->>DevlinkAPI: Set parameter on PF
        else ApplyOn == "VF"
            SriovLib->>SriovLib: Get VF list
            loop For each VF
                SriovLib->>NetworkLib: SetDevlinkDeviceParam(VF)
                NetworkLib->>DevlinkAPI: Set parameter on VF
            end
        end
    end
    
    Note over SriovLib: Device discovery
    SriovLib->>NetworkLib: GetDevlinkDeviceParams()
    NetworkLib->>DevlinkAPI: DevlinkGetDeviceParams()
    DevlinkAPI-->>NetworkLib: Return params with cmodes
    NetworkLib->>NetworkLib: Convert to DevlinkParam structs
    NetworkLib-->>SriovLib: Return devlink params
    SriovLib->>State: Update status with DevlinkParams
    
    Note over GenericPlugin: Bridge grouping (optional)
    alt GroupingPolicy == "all"
        GenericPlugin->>GenericPlugin: ConfigureBridgesGrouping()
        GenericPlugin->>GenericPlugin: Create OVS bridge with first interface
        loop For remaining interfaces
            GenericPlugin->>GenericPlugin: AddInterfaceToOVSBridge()
        end
    end
Loading

Comment on lines +429 to +432
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is checked after type assertion at line 429, but it was never assigned an error value from the type assertion

Suggested change
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
strVal := val.(string)


for _, v := range p.Values {
// only CLI-visible cmodes
strVal, _ := n.devlinkValueToString(p.Type, v.Data)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error from devlinkValueToString is silently discarded, could mask conversion failures

Comment on lines +446 to +461
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ApplyOn field defaults to empty string but only handles exact matches to PF/VF constants; should handle empty/lowercase values or set default

func (b *bridge) ConfigureBridgesGrouping(interfaces sriovnetworkv1.Interfaces, name, groupingPolicy string) error {
if groupingPolicy != "" {
if groupingPolicy != constants.OvsGroupingPolicyAll {
log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log message says "DiscoverBridges()" but function is ConfigureBridgesGrouping()

Suggested change
log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy)
log.Log.V(2).Info("ConfigureBridgesGrouping(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy)

func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) {
log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress)

requsted, found := false, false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requsted is mis-spelled

Suggested change
requsted, found := false, false
requested, found := false, false

if devlinkParam.Name != devlinkMultiportParamName {
continue
}
requsted = true
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update variable reference after fixing typo

Suggested change
requsted = true
requested = true

}
}
}
if requsted && !found {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update variable reference after fixing typo

Suggested change
if requsted && !found {
if requested && !found {

@e0ne e0ne force-pushed the devlink-params-impl branch 2 times, most recently from cf09545 to 217db27 Compare January 8, 2026 14:25
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds devlink parameter configuration support to the SR-IOV Network Operator, enabling users to configure generic and vendor-specific devlink parameters for network devices. The implementation includes API extensions, discovery mechanisms, and vendor-specific firmware handling for Mellanox NICs.

Key Changes:

  • Added DevlinkParams API to SriovNetworkNodePolicy and SriovNetworkNodeState with support for PF, VF, and SF devices
  • Implemented devlink parameter discovery and application in the SRIOV library
  • Added Mellanox-specific esw_multiport handling with LAG_RESOURCE_ALLOCATION firmware configuration
  • Extended bridge configuration with grouping policy support (perPF, perNIC, all)
  • Comprehensive test coverage for new functionality

Critical Issues Found:

  • Bug in pkg/host/internal/network/network.go:429-433: Unreachable error check that will never execute (checked err variable that's always nil)
  • Test failures in pkg/vendors/mellanox/mellanox_test.go:310,321: Missing Multiport: 0 field causing tests to fail
  • Incomplete implementation in pkg/host/internal/sriov/sriov.go:447-460: Missing SF (Scalable Function) case handling and case-insensitive ApplyOn comparison

Minor Issues:

  • Typo in constant name: OvsGroupingPolicyPerVF should be OvsGroupingPolicyPerNIC
  • Several grammar typos in error messages
  • Bridge grouping only supports "all" policy, silently ignores others

The design is sound and follows the operator's architecture well, but the critical bugs need to be fixed before merging.

Confidence Score: 3/5

  • This PR has critical bugs that will cause test failures and runtime issues with devlink parameter handling.
  • Score reflects three critical bugs: unreachable error check in network.go, test failures in mellanox_test.go, and incomplete SF case handling in sriov.go. While the design is solid and test coverage is good, these issues will prevent proper functionality and cause test failures.
  • Pay close attention to pkg/host/internal/network/network.go (unreachable error), pkg/vendors/mellanox/mellanox_test.go (failing tests), and pkg/host/internal/sriov/sriov.go (missing SF handling)

Important Files Changed

File Analysis

Filename Score Overview
pkg/host/internal/network/network.go 3/5 Added devlink parameter retrieval with refactoring. Contains critical bug with unreachable error check on line 429-433.
pkg/vendors/mellanox/mellanox_test.go 3/5 Added comprehensive tests for eswitch params. Tests on lines 310 and 321 will fail due to missing Multiport field.
pkg/host/internal/sriov/sriov.go 3/5 Added devlink param discovery and application logic. Missing SF case handling and case-insensitive ApplyOn comparison.
pkg/vendors/mellanox/mellanox.go 4/5 Added esw_multiport handling with LAG_RESOURCE_ALLOCATION firmware config. Logic looks correct for Mellanox-specific requirements.
api/v1/sriovnetworknodepolicy_types.go 5/5 Added DevlinkParam and DevlinkParams structs with proper kubebuilder validation tags. Clean API design.
api/v1/sriovnetworknodestate_types.go 5/5 Added DevlinkParams to Interface, InterfaceExt, VirtualFunction, and Bridges. Consistent with policy types.
pkg/host/internal/bridge/bridge.go 4/5 Added ConfigureBridgesGrouping for "all" policy. Only supports "all" grouping, logs and skips other values.
pkg/consts/constants.go 4/5 Added devlink ApplyOn and OVS grouping constants. Contains typo: OvsGroupingPolicyPerVF should be PerNIC.

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Helper as Policy Helper
    participant State as SriovNetworkNodeState
    participant MellanoxPlugin as Mellanox Plugin
    participant GenericPlugin as Generic Plugin
    participant SriovLib as SRIOV Library
    participant NetworkHelper as Network Helper
    participant Netlink as Netlink Library

    User->>Policy: Create/Update with DevlinkParams
    Policy->>Helper: Apply()
    Helper->>State: Update spec.interfaces[].devlinkParams
    
    Note over State,MellanoxPlugin: Vendor-specific firmware config (if needed)
    State->>MellanoxPlugin: OnNodeStateChange()
    MellanoxPlugin->>MellanoxPlugin: HandleESwitchParams()
    MellanoxPlugin->>MellanoxPlugin: Check esw_multiport param
    alt esw_multiport needs change
        MellanoxPlugin->>MellanoxPlugin: Set LAG_RESOURCE_ALLOCATION firmware flag
        MellanoxPlugin-->>State: needReboot = true
    end
    
    Note over State,GenericPlugin: OS-level devlink config
    State->>GenericPlugin: Apply()
    GenericPlugin->>SriovLib: ConfigureSriovDevices()
    SriovLib->>SriovLib: configSriovPFDevice()
    
    loop For each DevlinkParam
        alt ApplyOn == PF
            SriovLib->>NetworkHelper: SetDevlinkDeviceParam(pciAddr, name, value)
            NetworkHelper->>Netlink: DevlinkSetDeviceParam()
        else ApplyOn == VF
            SriovLib->>SriovLib: Get VF list
            loop For each VF
                SriovLib->>NetworkHelper: SetDevlinkDeviceParam(vfAddr, name, value)
                NetworkHelper->>Netlink: DevlinkSetDeviceParam()
            end
        end
    end
    
    Note over SriovLib,Netlink: Discovery phase
    SriovLib->>NetworkHelper: GetDevlinkDeviceParams(pciAddr)
    NetworkHelper->>Netlink: DevlinkGetDeviceParams()
    Netlink-->>NetworkHelper: Return all params
    NetworkHelper->>NetworkHelper: Convert to DevlinkParam structs
    NetworkHelper-->>SriovLib: Return DevlinkParam[]
    SriovLib-->>State: Update status.interfaces[].devlinkParams
Loading

Comment on lines +429 to +433
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable error check - err is always nil at this point

The error check is checking err after type assertion, but err was never assigned from the type assertion. The error from line 412 is intentionally ignored with _. This check will never be true.

Suggested change
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
}
strVal := val.(string)
devlinkParam := sriovnetworkv1.DevlinkParam{

u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return(
"",
"", nil)
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing Multiport: 0 field in test struct

Test expects LAG_RESOURCE_ALLOCATION=0 in line 307, but doesn't set Multiport: 0 in the struct. This will cause the test to fail because line 228 in mellanox.go only appends the parameter when Multiport != -1.

Suggested change
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}})
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH", Multiport: 0}})

u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test", "LAG_RESOURCE_ALLOCATION=0").Return(
"",
"", testError)
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing Multiport: 0 field in test struct

Test expects LAG_RESOURCE_ALLOCATION=0 in line 318, but doesn't set Multiport: 0 in the struct. This will cause the test to fail.

Suggested change
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}})
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test", Multiport: 0}})

}
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in error message: "unable devlink parameters" should be "unable to get devlink parameters"

Suggested change
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
log.Log.Error(err, "DiscoverSriovDevices(): unable to get devlink parameters, skipping",

Comment on lines +447 to +460
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing handling for SF (Scalable Function) ApplyOn case

The API defines SF as a valid value for ApplyOn (line 69 in sriovnetworknodepolicy_types.go), but the switch statement only handles PF and VF cases. SF parameters will be silently ignored.

Comment on lines +447 to +460
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing default handling and case-insensitive comparison for ApplyOn

The constants are uppercase (PF, VF, SF) but the API allows lowercase variants (e.g., pf, vf, sf per kubebuilder validation). Should normalize to uppercase or use case-insensitive comparison.

DevlinkParamApplyOnPf = "PF"
DevlinkParamApplyOnVf = "VF"
DevlinkParamApplyOnSf = "SF"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in constant name: should be OvsGroupingPolicyPerNIC not OvsGroupingPolicyPerVF

The design doc and API use perNIC (line 87 in sriovnetworknodepolicy_types.go), not perVF.

}
bridge, err := o.getBridgeByName(ctx, dbClient, bridgeName)
if err != nil {
funcLog.Error(err, "AddInterfaceToOVSBridge(): failed to get bridge", "bridge", bridgeName)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in error message: "connect to get bridge" should be "get bridge"

Suggested change
funcLog.Error(err, "AddInterfaceToOVSBridge(): failed to get bridge", "bridge", bridgeName)
return fmt.Errorf("failed to get bridge: %v", err)

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR implements devlink parameter configuration support for SR-IOV Network Operator, allowing users to configure low-level device parameters via the API. The implementation follows a two-phase approach: vendor-specific firmware configuration (e.g., Mellanox esw_multiport via LAG_RESOURCE_ALLOCATION firmware flag) followed by OS-level devlink parameter application via the generic plugin.

Major changes:

  • Extended SriovNetworkNodePolicy and SriovNetworkNodeState APIs with DevlinkParams field containing parameter name, value, cmode, and applyOn (PF/VF/SF)
  • Added devlink parameter discovery in DiscoverSriovDevices to populate status
  • Implemented applyDevlinkPfParam and applyDevlinkVfParam to configure parameters at OS level
  • Added Mellanox-specific HandleESwitchParams function to configure firmware before OS-level changes
  • Refactored devlink value parsing into reusable helper functions
  • Added bridge grouping policy support (currently only all policy implemented)

Issues found:

  • Critical: Error handling mismatch in sriov.go - logs "skipping device" on devlink error but continues processing, while test expects device to be skipped entirely
  • Bug: Constant name typo OvsGroupingPolicyPerVF should be OvsGroupingPolicyPerNIC per API enum validation
  • Maintenance: Incorrect function name in log message (DiscoverBridges instead of ConfigureBridgesGrouping)
  • Missing validation for empty/invalid ApplyOn values - will silently skip param application
  • Unclear logic when esw_multiport param not found - defaults to enabling multiport without explanation

Confidence Score: 3/5

  • PR has solid design but contains critical error handling bug that will cause test failure
  • Score reflects critical logic error where sriov discovery logs "skipping device" on error but continues processing (line 334), contradicting test expectations that device should be completely skipped. Also includes constant naming bug that doesn't match CRD enum validation. Good test coverage and clear architecture, but these bugs need fixing before merge.
  • Pay close attention to pkg/host/internal/sriov/sriov.go (error handling logic), pkg/consts/constants.go (naming mismatch), and pkg/vendors/mellanox/mellanox.go (unclear default behavior)

Important Files Changed

File Analysis

Filename Score Overview
pkg/host/internal/sriov/sriov.go 2/5 added devlink discovery and configuration, but error handling inconsistent with test expectations (logs "skipping" but continues)
pkg/host/internal/sriov/sriov_test.go 3/5 added tests for devlink params, but test expects device skip on error while implementation just logs and continues
pkg/vendors/mellanox/mellanox.go 3/5 added esw_multiport handling via firmware LAG_RESOURCE_ALLOCATION, unclear default behavior when param not found
pkg/host/internal/network/network.go 4/5 refactored devlink value parsing into helper functions, added GetDevlinkDeviceParams for batch retrieval
pkg/host/internal/bridge/bridge.go 3/5 added bridge grouping configuration, but only supports all policy (others silently skipped), incorrect function name in log
pkg/consts/constants.go 2/5 added devlink and OVS constants, but OvsGroupingPolicyPerVF doesn't match enum (perNIC expected)

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Generic as GenericPlugin
    participant Mellanox as MellanoxPlugin
    participant SRIOV as SRIOV Discovery
    participant Network as NetworkHelper
    participant Vendor as MellanoxVendor

    User->>Policy: Create/Update with DevlinkParams
    Policy->>Policy: Apply policy to NodeState spec
    
    Note over Mellanox,Vendor: Vendor-specific firmware configuration
    Mellanox->>Vendor: OnNodeStateChange
    Vendor->>Vendor: HandleESwitchParams(esw_multiport)
    alt esw_multiport change required
        Vendor->>Vendor: Set LAG_RESOURCE_ALLOCATION firmware flag
        Vendor-->>Mellanox: needReboot=true
    end
    Vendor->>Vendor: HandleLinkType
    Vendor->>Vendor: MlxConfigFW (mstconfig command)
    
    Note over Generic,Network: OS-level devlink configuration
    Generic->>SRIOV: ConfigureSriovDevice
    SRIOV->>SRIOV: configSriovPFDevice
    loop for each DevlinkParam
        alt ApplyOn == PF
            SRIOV->>Network: SetDevlinkDeviceParam(PF)
        else ApplyOn == VF
            SRIOV->>Network: SetDevlinkDeviceParam(each VF)
        end
    end
    
    Note over SRIOV,Network: Device discovery and status update
    SRIOV->>Network: DiscoverSriovDevices
    Network->>Network: GetDevlinkDeviceParams
    Network-->>SRIOV: DevlinkParams list
    SRIOV-->>Policy: Update NodeState status
    
    Note over Generic: Bridge grouping (if configured)
    Generic->>Generic: ConfigureBridgesGrouping
    alt GroupingPolicy == "all"
        Generic->>Generic: CreateOVSBridge with all interfaces
    end
Loading

Comment on lines +332 to +338
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
} else {
iface.DevlinkParams.Params = devlinkParams
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log message says "skipping" but device continues to be added to pfList

Suggested change
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
} else {
iface.DevlinkParams.Params = devlinkParams
}
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable to get devlink parameters, skipping device",
"device", device)
continue
}
iface.DevlinkParams.Params = devlinkParams

func (b *bridge) ConfigureBridgesGrouping(interfaces sriovnetworkv1.Interfaces, name, groupingPolicy string) error {
if groupingPolicy != "" {
if groupingPolicy != constants.OvsGroupingPolicyAll {
log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name in log message is DiscoverBridges() but this function is ConfigureBridgesGrouping()

Suggested change
log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy)
log.Log.V(2).Info("ConfigureBridgesGrouping(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy)

Comment on lines +419 to +421
} else {
attr.Multiport = 1
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when devlinkParam is nil, Multiport is set to 1 (enable), but the comment on line 462 says "esw_multiport devlink parameter not found and not configured" - unclear why enabling when not found

Consider explicitly handling this case or adding explanation

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Comment on lines +445 to +460

for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no validation for empty ApplyOn field - will silently skip devlink param application if not set to PF or VF

Consider adding a default value or explicit warning when ApplyOn is unrecognized

Comment on lines +170 to +172

OvsGroupingPolicyPerPF = "perPF"
OvsGroupingPolicyPerVF = "perVF"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: should be perNIC not perVF based on API enum validation in sriovnetworknodepolicy_types.go (line 90: perPF;perNIC;all)

Suggested change
OvsGroupingPolicyPerPF = "perPF"
OvsGroupingPolicyPerVF = "perVF"
OvsGroupingPolicyPerPF = "perPF"
OvsGroupingPolicyPerNIC = "perNIC"
OvsGroupingPolicyAll = "all"

Comment on lines +450 to +473
func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) {
log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress)

requsted, found := false, false
for _, devlinkParam := range iface.DevlinkParams.Params {
if devlinkParam.Name != devlinkMultiportParamName {
continue
}
requsted = true

for _, devlinkParamStatus := range ifaceStatus.DevlinkParams.Params {
if devlinkParamStatus.Name == devlinkMultiportParamName {
found = true
if iface.Name == ifaceStatus.Name && devlinkParam.Value != devlinkParamStatus.Value {
return true, &devlinkParam
}
}
}
}
if requsted && !found {
// esw_multiport devlink parameter not found and not configured
return true, nil
}
return false, nil
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function returns early when multiport param already matches desired state, but doesn't validate that both iface.Name and ifaceStatus.Name match before comparing values - name mismatch could cause false positives

@e0ne e0ne force-pushed the devlink-params-impl branch from 217db27 to 0ade45a Compare January 8, 2026 14:43
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR adds devlink parameters configuration support to the SR-IOV Network Operator, enabling runtime configuration of device-level parameters for network interfaces. It introduces new API types (DevlinkParam, DevlinkParams) with validation for parameter names, values, cmodes (runtime/driverinit/permanent), and application targets (PF/VF/SF). The implementation includes discovery of existing devlink parameters, application during device configuration, and vendor-specific handling for Mellanox NICs (specifically esw_multiport via LAG_RESOURCE_ALLOCATION firmware setting).

Key Changes:

  • Extended SriovNetworkNodePolicy and SriovNetworkNodeState APIs with DevlinkParams fields
  • Added GroupingPolicy field to Bridge configuration for OVS bridge grouping
  • Implemented devlink parameter discovery in DiscoverSriovDevices()
  • Added configuration logic in configSriovPFDevice() to apply params to PF/VF based on ApplyOn field
  • Mellanox plugin handles esw_multiport parameter requiring firmware configuration and reboot
  • Comprehensive test coverage for new functionality

Issues Found:

  • Missing default case for empty/invalid ApplyOn values - will silently skip configuration
  • Discovery error handling skips entire device instead of continuing with empty params
  • Test cases missing Multiport field initialization causing mismatch with expected mstconfig commands
  • Error handling in bridge grouping loop masks earlier failures
  • Case-insensitive API validation (PF/pf) not handled in implementation (only uppercase checked)

Confidence Score: 3/5

  • PR has functional issues that need fixing before merge - missing error handling and default cases
  • Score reflects several logic bugs that will cause runtime issues: missing default case in ApplyOn switch causes silent failures, discovery error removes entire device from state, test mismatches indicate potential runtime failures, and bridge loop error handling masks failures. Good test coverage and design, but implementation gaps need resolution.
  • Pay close attention to pkg/host/internal/sriov/sriov.go (ApplyOn handling and discovery error), pkg/vendors/mellanox/mellanox_test.go (test initialization), and pkg/host/internal/bridge/bridge.go (error handling)

Important Files Changed

File Analysis

Filename Score Overview
api/v1/sriovnetworknodepolicy_types.go 3/5 Added DevlinkParam and DevlinkParams types with validation enums; ApplyOn case handling issue
pkg/host/internal/sriov/sriov.go 2/5 Added devlink param discovery and configuration; missing default case, discovery error handling issue
pkg/host/internal/network/network.go 4/5 Implemented GetDevlinkDeviceParams to retrieve all devlink params; refactored value conversion
pkg/vendors/mellanox/mellanox.go 3/5 Added HandleESwitchParams for esw_multiport configuration via LAG_RESOURCE_ALLOCATION; interface name check concern
pkg/vendors/mellanox/mellanox_test.go 2/5 Added comprehensive tests for HandleESwitchParams; two existing tests missing Multiport initialization
pkg/host/internal/bridge/bridge.go 3/5 Implemented ConfigureBridgesGrouping for OVS bridge grouping; error handling issue in loop

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Helper as API Helper
    participant GenericPlugin as Generic Plugin
    participant MellanoxPlugin as Mellanox Plugin
    participant SRIOV as SRIOV Host
    participant Network as Network Helper
    participant Mellanox as Mellanox Vendor
    participant Bridge as Bridge Manager

    User->>Policy: Create/Update with DevlinkParams
    Policy->>Helper: Apply policy to node state
    Helper->>Helper: Copy DevlinkParams to Interface

    Note over GenericPlugin,MellanoxPlugin: Plugin Processing

    MellanoxPlugin->>Mellanox: HandleESwitchParams
    Mellanox->>Mellanox: Check esw_multiport param
    alt esw_multiport requires change
        Mellanox->>Mellanox: Set Multiport=1/0 in MlxNic
        Mellanox-->>MellanoxPlugin: needReboot=true
    else no change needed
        Mellanox-->>MellanoxPlugin: needReboot=false
    end

    MellanoxPlugin->>Mellanox: MlxConfigFW
    Mellanox->>Mellanox: Set LAG_RESOURCE_ALLOCATION

    GenericPlugin->>SRIOV: configSriovPFDevice
    loop for each DevlinkParam
        alt ApplyOn == PF
            SRIOV->>Network: SetDevlinkDeviceParam(PCI, name, value)
        else ApplyOn == VF
            SRIOV->>SRIOV: GetVFList
            loop for each VF
                SRIOV->>Network: SetDevlinkDeviceParam(VF_PCI, name, value)
            end
        end
    end

    GenericPlugin->>Bridge: ConfigureBridgesGrouping
    alt groupingPolicy == "all"
        Bridge->>Bridge: CreateOVSBridge with first interface
        loop for remaining interfaces
            Bridge->>Bridge: AddInterfaceToOVSBridge
        end
    end

    Note over SRIOV,Network: Discovery Phase
    SRIOV->>Network: GetDevlinkDeviceParams(PCI)
    Network->>Network: DevlinkGetDeviceParams
    Network->>Network: Convert to DevlinkParam structs
    Network-->>SRIOV: []DevlinkParam
    SRIOV->>SRIOV: Add to InterfaceExt.DevlinkParams
Loading

Comment on lines +446 to +460
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no default case to handle empty ApplyOn or unexpected values

Suggested change
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
for _, param := range iface.DevlinkParams.Params {
applyOn := param.ApplyOn
if applyOn == "" {
applyOn = consts.DevlinkParamApplyOnPf
}
switch applyOn {
case consts.DevlinkParamApplyOnPf, "pf":
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf, "vf":
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
default:
log.Log.V(2).Info("configSriovPFDevice(): skipping unsupported ApplyOn value", "device", iface.PciAddress, "applyOn", param.ApplyOn)
}
}

Comment on lines +332 to +338
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
} else {
iface.DevlinkParams.Params = devlinkParams
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device skipped silently on devlink error - discovery continues but device excluded from pfList, causing inconsistent state reporting

u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return(
"",
"", nil)
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test expects LAG_RESOURCE_ALLOCATION=0 but Multiport field not set in struct initialization

Suggested change
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}})
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH", Multiport: 0}})

u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test", "LAG_RESOURCE_ALLOCATION=0").Return(
"",
"", testError)
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}})
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test expects LAG_RESOURCE_ALLOCATION=0 but Multiport field not set in struct initialization

Suggested change
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}})
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test", Multiport: 0}})

Comment on lines +115 to +121
for _, iface := range interfaces[1:] {
err := b.ovs.AddInterfaceToOVSBridge(context.Background(), brName, iface.Name)
if err != nil {
log.Log.Error(err, "ConfigureBridgesGrouping(): failed to add interface to OVS bridge", "name", bridgeConf.Name, "iface", iface.Name)
return err
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

loop reuses err variable - will only report error from last interface addition, masking earlier failures

Comment on lines +151 to +154
Cmode string `json:"cmode,omitempty"`
// Device to apply devlink parameter: PF (default)|VF|SF
// +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf
ApplyOn string `json:"applyOn,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

API allows case-insensitive values (PF/pf, VF/vf, SF/sf) but implementation only checks uppercase constants

for _, devlinkParamStatus := range ifaceStatus.DevlinkParams.Params {
if devlinkParamStatus.Name == devlinkMultiportParamName {
found = true
if iface.Name == ifaceStatus.Name && devlinkParam.Value != devlinkParamStatus.Value {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interface name check (iface.Name == ifaceStatus.Name) may prevent detection if interface renamed - PCI address comparison more reliable

@e0ne e0ne force-pushed the devlink-params-impl branch 2 times, most recently from 10827fa to e7ff81a Compare January 8, 2026 15:00
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR implements devlink parameter configuration support for SR-IOV devices, allowing users to configure low-level device parameters at both the PF and VF level. The implementation includes API extensions, discovery logic, application logic, and vendor-specific support for Mellanox NICs.

Major Changes:

  • Added DevlinkParams API to SriovNetworkNodePolicy with support for parameter name, value, cmode (runtime/driverinit/permanent), and ApplyOn target (PF/VF/SF)
  • Extended SriovNetworkNodeState to track devlink parameters in both spec and status for Interface, InterfaceExt, and VirtualFunction types
  • Implemented devlink parameter discovery in DiscoverSriovDevices() to query and report current device configuration
  • Added parameter application logic with separate handlers for PF and VF devices
  • Implemented Mellanox-specific support for esw_multiport parameter requiring firmware configuration via LAG_RESOURCE_ALLOCATION
  • Added OVS bridge grouping policy support with initial implementation for "all" policy

Issues Found:

  • Device discovery continues when GetDevlinkDeviceParams() fails, leading to incomplete device state rather than skipping the device
  • Missing implementation for SF (Scalable Function) target despite API validation allowing it
  • Cmode validation permits all modes (runtime/driverinit/permanent) but design doc states "runtime is only supported now"
  • Case-sensitivity mismatch: API allows both uppercase/lowercase ApplyOn values but code only handles uppercase
  • Grouping policy constants inconsistency between API validation enum and consts package
  • Several typos in variable names and incomplete log messages
  • Grouping policy validation only supports "all" but API allows "perPF" and "perNIC"

Confidence Score: 3/5

  • This PR requires fixes before merging due to logical issues in error handling and incomplete implementation
  • Score reflects several critical issues: device discovery error handling that leads to incomplete state, missing SF implementation despite API support, case-sensitivity bugs, and validation/implementation mismatches for cmode and grouping policy. The feature adds significant complexity with vendor-specific firmware configuration. However, comprehensive test coverage and good architectural separation partially offset these concerns.
  • Focus on pkg/host/internal/sriov/sriov.go for error handling and SF support, api/v1/sriovnetworknodepolicy_types.go for validation alignment, and pkg/host/internal/bridge/bridge.go for grouping policy implementation

Important Files Changed

File Analysis

Filename Score Overview
api/v1/sriovnetworknodepolicy_types.go 5/5 added DevlinkParams API with proper kubebuilder validation tags for devlink parameter configuration
pkg/host/internal/sriov/sriov.go 3/5 added devlink parameter discovery and application logic for PF and VF devices with error handling issue
pkg/host/internal/network/network.go 4/5 implemented GetDevlinkDeviceParams() to retrieve all devlink parameters and refactored value parsing into helper functions
pkg/vendors/mellanox/mellanox.go 4/5 added support for esw_multiport devlink parameter requiring firmware configuration via LAG_RESOURCE_ALLOCATION
pkg/host/internal/bridge/bridge.go 4/5 implemented ConfigureBridgesGrouping() to create OVS bridges with "all" grouping policy
pkg/consts/constants.go 5/5 added constants for devlink parameter ApplyOn values and OVS grouping policies

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Daemon as Config Daemon
    participant MellanoxPlugin as Mellanox Plugin
    participant GenericPlugin as Generic Plugin
    participant SRIOV as SRIOV Manager
    participant Network as Network Helper
    participant Devlink as Devlink (Kernel)
    participant FW as Firmware (mstconfig)

    User->>Policy: Create policy with DevlinkParams
    Policy->>Daemon: Apply policy to node
    Daemon->>MellanoxPlugin: OnNodeStateChange()
    
    Note over MellanoxPlugin: Check firmware prerequisites
    MellanoxPlugin->>MellanoxPlugin: HandleESwitchParams()
    alt esw_multiport parameter requested
        MellanoxPlugin->>MellanoxPlugin: Check if LAG_RESOURCE_ALLOCATION needed
        MellanoxPlugin->>FW: Configure LAG_RESOURCE_ALLOCATION via mstconfig
        MellanoxPlugin-->>Daemon: needReboot = true
    end
    
    Daemon->>GenericPlugin: Apply()
    GenericPlugin->>SRIOV: configSriovPFDevice()
    
    Note over SRIOV: Apply devlink parameters to devices
    loop For each DevlinkParam
        alt ApplyOn = PF
            SRIOV->>Network: SetDevlinkDeviceParam(PF)
            Network->>Devlink: Set parameter on PF device
        else ApplyOn = VF
            SRIOV->>SRIOV: Get VF list
            loop For each VF
                SRIOV->>Network: SetDevlinkDeviceParam(VF)
                Network->>Devlink: Set parameter on VF device
            end
        end
    end
    
    GenericPlugin->>GenericPlugin: ConfigureBridgesGrouping()
    alt GroupingPolicy = "all"
        GenericPlugin->>GenericPlugin: Create OVS bridge with all interfaces
    end
    
    Note over Daemon: Discovery phase
    Daemon->>SRIOV: DiscoverSriovDevices()
    SRIOV->>Network: GetDevlinkDeviceParams()
    Network->>Devlink: Query all devlink parameters
    Devlink-->>Network: Return parameters (runtime/driverinit/permanent)
    Network-->>SRIOV: Return DevlinkParams
    SRIOV-->>Daemon: Update node state with discovered params
    
    Daemon-->>User: Report status in SriovNetworkNodeState
Loading

Comment on lines 329 to +338
instance := s.getVfInfo(vf, pfNetName, iface.EswitchMode, devices)
iface.VFs = append(iface.VFs, instance)
}
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
} else {
iface.DevlinkParams.Params = devlinkParams
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

device discovery silently skipped when GetDevlinkDeviceParams fails - logs error but continues and appends device without devlink params to pfList

Suggested change
instance := s.getVfInfo(vf, pfNetName, iface.EswitchMode, devices)
iface.VFs = append(iface.VFs, instance)
}
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping",
"device", device)
} else {
iface.DevlinkParams.Params = devlinkParams
}
devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address)
if err != nil {
log.Log.Error(err, "DiscoverSriovDevices(): unable to get devlink parameters, skipping device",
"device", device.Address)
continue
}
iface.DevlinkParams.Params = devlinkParams

Comment on lines +446 to +461
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing case for SF (Scalable Function) - ApplyOn field supports "SF" but only "PF" and "VF" are handled here

Comment on lines +473 to +474
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log message missing param.Name value in structured logging field

Suggested change
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
return err
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param", param.Name)

Comment on lines +480 to +481
log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param")
return err
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log message missing param.Name value in structured logging field

Suggested change
log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param")
return err
log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param", param.Name)

func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) {
log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress)

requsted, found := false, false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

requsted is misspelled

Suggested change
requsted, found := false, false
requested, found := false, false

Comment on lines +100 to +102
uplink := &sriovnetworkv1.OVSUplinkConfigExt{
PciAddress: interfaces[0].PciAddress,
Name: interfaces[0].Name,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

only validates OvsGroupingPolicyAll constant but API validation enum includes perPF and perNIC - creates inconsistency where those values pass CRD validation but are rejected here

Comment on lines +103 to +105
}
brName := name + "-all"
bridgeConf := &sriovnetworkv1.OVSConfigExt{
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

error case is not clearly handled - returns error if interfaces list is empty but silently returns nil for unsupported grouping policies (line 94)

Comment on lines +147 to +154
// Param value
Value string `json:"value,omitempty"`
// cmode option: runtime (default) | driverinit | permanent (runtime is only supported now)
// +kubebuilder:validation:Enum=runtime;driverinit;permanent
Cmode string `json:"cmode,omitempty"`
// Device to apply devlink parameter: PF (default)|VF|SF
// +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf
ApplyOn string `json:"applyOn,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

design doc states "runtime is only supported now" but validation allows driverinit and permanent - should either implement support or restrict validation to runtime only

Comment on lines +151 to +154
Cmode string `json:"cmode,omitempty"`
// Device to apply devlink parameter: PF (default)|VF|SF
// +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf
ApplyOn string `json:"applyOn,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case-sensitive validation allows both uppercase and lowercase values ("PF" vs "pf") but implementation in sriov.go uses only uppercase constants from consts package

Comment on lines +172 to +173
OvsGroupingPolicyPerVF = "perVF"
OvsGroupingPolicyAll = "all"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OvsGroupingPolicyPerVF constant defined but not present in API validation enum - API uses perPF and perNIC instead

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR implements devlink parameters configuration for SR-IOV network devices, extending the API to support device-level configuration beyond firmware and OS settings.

Key Changes

  • API Extensions: Added DevlinkParams and DevlinkParam types to SriovNetworkNodePolicy and SriovNetworkNodeState with support for PF, VF, and SF devices across runtime, driverinit, and permanent cmodes
  • Discovery: Implemented GetDevlinkDeviceParams() to read all configured devlink parameters from devices during discovery phase
  • Configuration: Added applyDevlinkPfParam() and applyDevlinkVfParam() to apply devlink parameters to PF and VF devices respectively
  • Mellanox Integration: Implemented HandleESwitchParams() to configure esw_multiport parameter with firmware flag LAG_RESOURCE_ALLOCATION for NVIDIA NICs
  • Bridge Grouping: Added GroupingPolicy field to Bridge struct with ConfigureBridgesGrouping() supporting perPF, perNIC, and all policies
  • Test Coverage: Comprehensive unit tests for HandleESwitchParams() with multiple scenarios

Issues Found

  • Critical: Error silently ignored in GetDevlinkDeviceParams() line 412 causing incorrect error handling
  • Logic: Missing SF (Subfunction) and default case handling in devlink parameter application switch statement
  • Syntax: Variable typo requsted instead of requested in isESwitchParamsRequireChange()
  • Test Failures: Missing Multiport field initialization in existing Mellanox tests will cause failures

Confidence Score: 3/5

  • This PR has several bugs that must be fixed before merge, including a critical error handling bug and missing test initializations.
  • Score reflects critical error handling bug in network.go that silently ignores errors during devlink parameter discovery, missing SF device support in the switch statement despite API allowing it, and test failures due to missing Multiport field initialization. The typo in variable naming is minor but should be fixed. The overall feature implementation is well-designed with good test coverage for new functionality, but the bugs identified need resolution.
  • pkg/host/internal/network/network.go (critical error handling bug), pkg/host/internal/sriov/sriov.go (missing SF support), and pkg/vendors/mellanox/mellanox_test.go (test failures)

Important Files Changed

File Analysis

Filename Score Overview
api/v1/sriovnetworknodepolicy_types.go 5/5 Added DevlinkParams API types with proper validation. Also added GroupingPolicy field to Bridge struct.
api/v1/sriovnetworknodestate_types.go 5/5 Extended Interface, InterfaceExt, VirtualFunction, and Bridges with DevlinkParams fields.
pkg/host/internal/network/network.go 3/5 Added GetDevlinkDeviceParams and refactored devlink value conversion. Critical bug: error silently ignored on line 412.
pkg/host/internal/sriov/sriov.go 3/5 Added devlink discovery and configuration with applyDevlinkPfParam/VfParam. Missing SF support and default case handling.
pkg/vendors/mellanox/mellanox.go 4/5 Implemented esw_multiport firmware configuration with HandleESwitchParams. Has typo in variable name 'requsted'.
pkg/vendors/mellanox/mellanox_test.go 3/5 Added comprehensive tests for HandleESwitchParams. Existing tests missing Multiport field initialization causing failures.

Sequence Diagram

sequenceDiagram
    participant User
    participant Policy as SriovNetworkNodePolicy
    participant Plugin as MellanoxPlugin
    participant Vendor as Mellanox Vendor
    participant Host as Host/SRIOV
    participant Network as Network Helper
    participant Kernel as Kernel/Devlink

    User->>Policy: Create/Update Policy with DevlinkParams
    Policy->>Plugin: OnNodeStateChange()
    
    Note over Plugin,Vendor: Firmware Configuration Phase
    Plugin->>Vendor: HandleESwitchParams()
    Vendor->>Vendor: Check if esw_multiport change needed
    alt esw_multiport requires change
        Vendor->>Vendor: Set LAG_RESOURCE_ALLOCATION firmware flag
        Vendor-->>Plugin: needReboot = true
    else no change needed
        Vendor-->>Plugin: needReboot = false
    end
    
    Note over Host,Kernel: Discovery Phase
    Host->>Network: DiscoverSriovDevices()
    Network->>Kernel: GetDevlinkDeviceParams(pciAddr)
    Kernel-->>Network: Current devlink params
    Network->>Network: Convert to DevlinkParam structs
    Network-->>Host: DevlinkParams list
    Host->>Host: Store in InterfaceExt
    
    Note over Host,Kernel: Configuration Phase
    Host->>Host: configSriovPFDevice(iface)
    loop For each DevlinkParam
        alt ApplyOn == PF
            Host->>Network: SetDevlinkDeviceParam(pciAddr, name, value)
            Network->>Kernel: DevlinkSetDeviceParam()
        else ApplyOn == VF
            Host->>Host: GetVFList(pciAddr)
            loop For each VF
                Host->>Network: SetDevlinkDeviceParam(vfAddr, name, value)
                Network->>Kernel: DevlinkSetDeviceParam()
            end
        end
    end
    
    Host-->>User: Configuration Applied
Loading

Comment on lines +429 to +433
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

err is checked after the type assertion on line 429, but err is never assigned from the devlinkValueToString call on line 412. The error is silently ignored with _.

Suggested change
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
}
strVal, err := n.devlinkValueToString(p.Type, v.Data)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
}

func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) {
log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress)

requsted, found := false, false
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: requsted should be requested

Suggested change
requsted, found := false, false
requested, found := false, false

Comment on lines +446 to +461
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing handling for SF (Subfunction) devices. The switch statement only handles PF and VF but the API supports SF as per the ApplyOn enum validation in sriovnetworknodepolicy_types.go

Suggested change
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
}
}
for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
case consts.DevlinkParamApplyOnPf:
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnVf:
err = s.applyDevlinkVfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name)
return err
}
case consts.DevlinkParamApplyOnSf:
// TODO: Implement SF support
log.Log.Info("configSriovPFDevice(): SF devlink params not yet supported, skipping", "device", iface.PciAddress, "param", param.Name)
case "":
// Empty ApplyOn defaults to PF
err = s.applyDevlinkPfParam(iface, param)
if err != nil {
log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name)
return err
}
default:
log.Log.Error(nil, "configSriovPFDevice(): unsupported ApplyOn value", "device", iface.PciAddress, "param", param.Name, "applyOn", param.ApplyOn)
return fmt.Errorf("unsupported ApplyOn value: %s", param.ApplyOn)
}
}

Comment on lines +94 to +97
return nil
}
if len(interfaces) == 0 {
return fmt.Errorf("unsupported configuration, interfaces list must contain at least one element")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This error can be returned even though the policy was validated to be constants.OvsGroupingPolicyAll on line 91. Consider removing the length check or handling empty interfaces before the policy check.

getBFMstconfigOutput(false, false),
"", nil)
u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH").Return(
u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiport field missing in test data initialization. It should be initialized to -1 to indicate no change needed, or to 0 to match the expected command output.

Suggested change
u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return(
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH", Multiport: 0}})

u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test").Return(
u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test", "LAG_RESOURCE_ALLOCATION=0").Return(
"",
"", testError)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiport field missing in test data initialization.

Suggested change
"", testError)
err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test", Multiport: 0}})

@nvidia-ci-cd nvidia-ci-cd force-pushed the network-operator-26.1.x branch from 01f2b6c to b33f550 Compare January 9, 2026 00:10
e0ne added 6 commits January 9, 2026 10:04
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>

fd
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
@e0ne e0ne force-pushed the devlink-params-impl branch from e7ff81a to 5ba2593 Compare January 9, 2026 08:04
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Greptile Overview

Greptile Summary

This PR implements devlink parameters configuration support for SR-IOV Network Operator, allowing users to configure device-level parameters through the SriovNetworkNodePolicy API.

Key Changes:

  • Adds DevlinkParams struct to policy spec with fields for parameter name, value, cmode (runtime/driverinit/permanent), and applyOn (PF/VF/SF)
  • Implements discovery of current devlink parameters during device scanning via GetDevlinkDeviceParams()
  • Applies devlink parameters through SetDevlinkDeviceParam() for both PF and VF devices
  • Mellanox vendor plugin handles esw_multiport parameter by setting firmware LAG_RESOURCE_ALLOCATION flag
  • Adds bridge grouping policy support with "all" option to group multiple interfaces under single OVS bridge
  • Comprehensive test coverage including unit tests for eswitch params and devlink discovery

Critical Issues Found:

  • Case-insensitive comparison missing for ApplyOn field - lowercase values (pf, vf, sf) will be silently ignored
  • Unreachable error check in GetDevlinkDeviceParams() due to discarded error from type conversion
  • DevlinkParams not merged in mergeConfigs() - when multiple policies apply to same interface, devlink params from lower-priority policies are lost
  • Missing parameter names in error log messages make debugging difficult

Minor Issues:

  • Empty string defaults not documented for Cmode and ApplyOn fields
  • Unclear default behavior when esw_multiport parameter is requested but not currently configured

Confidence Score: 3/5

  • This PR has several logic bugs that could cause silent failures and data loss, but no critical security vulnerabilities
  • Score reflects multiple logic bugs: case-insensitive comparison missing (causing silent failures for lowercase values), unreachable error check, and policy merge issues that lose devlink params. These are functional bugs that will impact users but are fixable. The core implementation is sound with good test coverage.
  • Pay close attention to pkg/host/internal/sriov/sriov.go (case handling), pkg/host/internal/network/network.go (error handling), and api/v1/helper.go (merge logic)

Important Files Changed

File Analysis

Filename Score Overview
pkg/host/internal/sriov/sriov.go 3/5 Adds devlink param application logic for PF/VF. Issues: case-insensitive comparison missing for ApplyOn, missing param names in error logs.
pkg/host/internal/network/network.go 3/5 Implements GetDevlinkDeviceParams() for discovery and refactors value conversion. Issue: unreachable error check on line 429-431.
pkg/vendors/mellanox/mellanox.go 3/5 Adds esw_multiport handling via LAG_RESOURCE_ALLOCATION firmware flag. Minor: unclear default behavior when param not found.
api/v1/sriovnetworknodepolicy_types.go 4/5 Adds DevlinkParam and DevlinkParams types with validation. Minor: empty string defaults not documented for Cmode/ApplyOn.
api/v1/helper.go 3/5 Copies DevlinkParams in Apply(). Issue: DevlinkParams not merged in mergeConfigs(), losing params from lower-priority policies.

Sequence Diagram

sequenceDiagram
    participant User
    participant API as SriovNetworkNodePolicy
    participant Controller as Controller/Daemon
    participant GenericPlugin
    participant MellanoxPlugin
    participant SRIOV as SRIOV Module
    participant Network as Network Helper
    participant Firmware as mstconfig/mstfwreset

    User->>API: Create/Update Policy with DevlinkParams
    API->>Controller: Apply policy to NodeState
    Note over API,Controller: DevlinkParams copied to Interface spec

    Controller->>MellanoxPlugin: OnNodeStateChange()
    MellanoxPlugin->>MellanoxPlugin: Check for esw_multiport param
    alt esw_multiport requested
        MellanoxPlugin->>MellanoxPlugin: Set LAG_RESOURCE_ALLOCATION flag
        MellanoxPlugin->>Firmware: mstconfig set LAG_RESOURCE_ALLOCATION
        Note over MellanoxPlugin,Firmware: Requires reboot
    end
    MellanoxPlugin-->>Controller: needReboot=true

    Controller->>GenericPlugin: OnNodeStateChange()
    GenericPlugin-->>Controller: needDrain/needReboot

    Controller->>Controller: Drain node if needed
    Controller->>GenericPlugin: Apply()
    GenericPlugin->>SRIOV: ConfigSriovInterfaces()
    
    SRIOV->>SRIOV: configSriovPFDevice()
    loop For each DevlinkParam
        alt ApplyOn == PF
            SRIOV->>Network: SetDevlinkDeviceParam(PF)
            Network->>Network: devlink set parameter
        else ApplyOn == VF
            SRIOV->>SRIOV: GetVFList()
            loop For each VF
                SRIOV->>Network: SetDevlinkDeviceParam(VF)
                Network->>Network: devlink set parameter
            end
        end
    end

    SRIOV->>SRIOV: configSriovVFDevices()
    SRIOV-->>GenericPlugin: Configuration complete

    Controller->>SRIOV: DiscoverSriovDevices()
    SRIOV->>Network: GetDevlinkDeviceParams(PF)
    Network->>Network: Query all devlink params
    Network-->>SRIOV: DevlinkParam list
    SRIOV-->>Controller: Updated status with devlink params

    Controller->>API: Update NodeState status
    Note over User,API: Status reflects configured devlink params
Loading

}

for _, param := range iface.DevlinkParams.Params {
switch param.ApplyOn {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

case-insensitive comparison needed for ApplyOn field

The API allows both uppercase and lowercase (PF;pf;VF;vf;SF;sf), but this switch only matches uppercase constants. Lowercase values will be silently ignored.

Suggested change
switch param.ApplyOn {
applyOn := strings.ToUpper(param.ApplyOn)
switch applyOn {

func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error {
vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing parameter name in log message

The "param" field is present in the error log but has no value, making debugging difficult.

Suggested change
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param")
log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param", param.Name)

for _, addr := range vfAddrs {
err = s.networkHelper.SetDevlinkDeviceParam(addr, param.Name, param.Value)
if err != nil {
log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing parameter name in log message

Same issue as line 473 - the "param" field needs a value.

Suggested change
log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param")
log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param", param.Name)

Comment on lines +429 to +431
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unreachable error check after type assertion

The variable err is always nil here (line 412: strVal, _ := n.devlinkValueToString(...)). This check will never trigger.

Suggested change
strVal := val.(string)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
strVal, err := n.devlinkValueToString(p.Type, v.Data)
if err != nil {
log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr)
return nil, err
}

Comment on lines +419 to +421
} else {
attr.Multiport = 1
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unclear logic when devlink param is not found

When esw_multiport is requested but not currently configured, the code sets attr.Multiport = 1. This assumes enabling multiport is the default, but it's unclear if this is intentional. Should consider explicitly documenting or validating this behavior.

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

EswitchMode: p.Spec.EswitchMode,
NumVfs: p.Spec.NumVfs,
ExternallyManaged: p.Spec.ExternallyManaged,
DevlinkParams: p.Spec.DevlinkParams,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DevlinkParams not merged in mergeConfigs

When merging interface configs from multiple policies, DevlinkParams field is copied wholesale from the input without merging. If two policies apply to the same interface with different devlink params, one set will be completely lost.

Comment on lines +150 to +154
// +kubebuilder:validation:Enum=runtime;driverinit;permanent
Cmode string `json:"cmode,omitempty"`
// Device to apply devlink parameter: PF (default)|VF|SF
// +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf
ApplyOn string `json:"applyOn,omitempty"`
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

empty string default not documented for Cmode and ApplyOn

The API validation allows these fields to be omitted (due to omitempty), but there's no documentation or defaulting logic for empty values. Code in sriov.go will fail to match empty strings against constants.

@e0ne e0ne merged commit dba9bb7 into Mellanox:network-operator-26.1.x Jan 9, 2026
9 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants