feat: Devlink parameters configuration#108
feat: Devlink parameters configuration#108e0ne merged 6 commits intoMellanox:network-operator-26.1.xfrom
Conversation
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
bf602d5 to
63c5ab2
Compare
|
Thanks for your PR,
To skip the vendors CIs, Maintainers can use one of:
|
config/crd/bases/sriovnetwork.openshift.io_sriovibnetworks.yaml
Outdated
Show resolved
Hide resolved
71c1b13 to
9260b97
Compare
Greptile SummaryThis PR implements devlink parameters configuration support for SR-IOV Network Operator, allowing users to configure device-level parameters through the Key Changes:
Issues Found:
Confidence Score: 3/5
Important Files Changed
Sequence DiagramsequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Controller as Operator Controller
participant VendorPlugin as Mellanox Plugin
participant GenericPlugin as Generic Plugin
participant HostHelper as Host Manager
participant Devlink as Devlink/Netlink
participant Firmware as NIC Firmware (mstconfig)
User->>Policy: Create/Update Policy with DevlinkParams
Policy->>Controller: Apply Policy to Node State
Controller->>VendorPlugin: OnNodeStateChange()
VendorPlugin->>VendorPlugin: HandleESwitchParams(esw_multiport)
alt esw_multiport requires change
VendorPlugin->>VendorPlugin: Set Multiport=1 in firmware attrs
VendorPlugin->>Firmware: Configure LAG_RESOURCE_ALLOCATION=1
VendorPlugin-->>Controller: needReboot=true
end
Controller->>GenericPlugin: Apply()
GenericPlugin->>HostHelper: DiscoverSriovDevices()
HostHelper->>Devlink: GetDevlinkDeviceParams(pciAddr)
Devlink-->>HostHelper: Current devlink params
HostHelper-->>GenericPlugin: Interface state with DevlinkParams
GenericPlugin->>HostHelper: ConfigureSriovPFDevice()
loop For each DevlinkParam
alt ApplyOn == PF
HostHelper->>Devlink: SetDevlinkDeviceParam(PF, name, value)
else ApplyOn == VF
loop For each VF
HostHelper->>Devlink: SetDevlinkDeviceParam(VF, name, value)
end
end
end
alt Bridge GroupingPolicy configured
GenericPlugin->>HostHelper: ConfigureBridgesGrouping()
HostHelper->>HostHelper: Create OVS bridge with first interface
loop For remaining interfaces
HostHelper->>HostHelper: Add interface to bridge
end
end
GenericPlugin-->>Controller: Configuration applied
Controller-->>User: Node state updated
|
There was a problem hiding this comment.
Additional Comments (4)
-
pkg/host/internal/sriov/sriov.go, line 471-474 (link)logic: error from
GetVFListis logged but not returned, yet the function continues to iterate over potentially emptyvfAddrsslice -
api/v1/sriovnetworknodepolicy_types.go, line 152-154 (link)logic: case-sensitivity for
ApplyOnvalidation - enum values include both uppercase and lowercase (PF/pf, VF/vf, SF/sf) but code only checks against uppercase constants. should the code normalize the ApplyOn field to uppercase before comparison, or is case-insensitive matching intended? -
pkg/host/internal/sriov/sriov.go, line 446-461 (link)logic: devlink parameters are applied AFTER VFs are created, but some parameters like
esw_multiportmay require firmware configuration before VF creation according to the design doc. is the ordering correct, or should vendor-specific firmware config (like Mellanox LAG_RESOURCE_ALLOCATION) happen before this generic devlink param application? -
pkg/host/internal/sriov/sriov.go, line 447 (link)logic: missing handling for SF (Scalable Function)
ApplyOnoption - only PF and VF cases are handled
29 files reviewed, 4 comments
8b9a8d8 to
1850188
Compare
9260b97 to
a0eb40b
Compare
There was a problem hiding this comment.
Additional Comments (4)
-
pkg/host/internal/sriov/sriov.go, line 472-474 (link)syntax:
paramargument not included in log messageNote: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
-
pkg/host/internal/sriov/sriov.go, line 479 (link)syntax:
param.Namenot included in log message -
pkg/host/internal/network/network.go, line 427-433 (link)logic: Error variable
erris checked but never setWas type assertion meant to be used here, or was
devlinkValueToStringsupposed to be called? -
pkg/host/internal/sriov/sriov.go, line 446-461 (link)style: switch statement missing default case - params with empty or invalid
ApplyOnfields will be silently ignored
29 files reviewed, 4 comments
a0eb40b to
e7879fc
Compare
There was a problem hiding this comment.
Additional Comments (6)
-
pkg/host/internal/bridge/bridge.go, line 96-98 (link)logic: Interfaces slice could be empty even after the check on line 96, causing panic on line 100 when accessing
interfaces[0]. The check validateslen(interfaces) == 0but this happens inside thegroupingPolicy != ""block. If called with empty interfaces and empty groupingPolicy, the function returns nil without error, but with non-empty groupingPolicy it would fail. -
pkg/host/internal/sriov/sriov.go, line 332-337 (link)logic: Error getting devlink parameters causes the entire PF to be skipped with
continue. This means if devlink fails for any reason (device doesn't support it, permission issues, etc.), the PF won't be included in the discovered devices list at all. Consider logging a warning but continuing without devlink params instead. -
pkg/host/internal/sriov/sriov.go, line 471-474 (link)logic: Error from
GetVFList()is assigned but the function continues to iterate over potentially nil/emptyvfAddrsslice. This will silently fail to apply devlink params to VFs without indication of the failure. -
pkg/host/internal/sriov/sriov.go, line 446-461 (link)logic: Missing default case handling for
ApplyOnfield. Ifparam.ApplyOnis empty string,SF, or lowercase variants (pf,vf), the devlink parameter will be silently skipped. According to the API definition insriovnetworknodepolicy_types.go:153, lowercase variants are allowed by the enum validator but not handled here. -
pkg/vendors/mellanox/mellanox.go, line 446-469 (link)style: Logic only checks the first port (pciAddress + "0") for esw_multiport parameter. For multi-port NICs, this may not correctly handle configurations where different ports have different devlink parameter requirements.
-
pkg/host/internal/sriov/sriov.go, line 446-461 (link)style: Use constants from
pkg/constspackage for comparingApplyOnvalues instead of hardcoded strings, and handle case-insensitive comparison since API allows both uppercase and lowercase.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
33 files reviewed, 6 comments
| EswitchMode: p.Spec.EswitchMode, | ||
| NumVfs: p.Spec.NumVfs, | ||
| ExternallyManaged: p.Spec.ExternallyManaged, | ||
| DevlinkParams: p.Spec.DevlinkParams, |
There was a problem hiding this comment.
Regarding the bridge multiport, don't we need to update ApplyBridgeConfig to build the spec?
e7879fc to
34fbdf8
Compare
There was a problem hiding this comment.
Additional Comments (4)
-
pkg/host/internal/sriov/sriov.go, line 471-474 (link)logic: error not returned after logging
-
pkg/host/internal/sriov/sriov.go, line 446-461 (link)logic: case-insensitive matching not handled - API allows both uppercase and lowercase (
PF,pf,VF,vf,SF,sf) but switch only checks uppercase constantsConsider normalizing
param.ApplyOnwithstrings.ToUpper()before the switch, or add lowercase cases -
pkg/host/internal/sriov/sriov.go, line 446-461 (link)logic: missing default case and empty
ApplyOnhandlingThe API comment says "PF (default)" but there's no default case when
ApplyOnis empty or containsSF. Add a default case to handle unrecognized values -
api/v1/sriovnetworknodepolicy_types.go, line 149-151 (link)style: comment says "runtime is only supported now" but code doesn't validate or enforce this - all three cmodes (
runtime,driverinit,permanent) are allowed by the enum
35 files reviewed, 4 comments
There was a problem hiding this comment.
Additional Comments (3)
-
pkg/host/internal/sriov/sriov.go, line 471-474 (link)logic: Missing error return after
GetVFListfails. If getting VF list fails, should return error immediately rather than continuing with empty list. -
pkg/host/internal/sriov/sriov.go, line 446-461 (link)logic: Case-sensitive comparison of
ApplyOnfield. API allowsPF,pf,VF,vf,SF,sfbut switch only checks uppercase constants. -
pkg/host/internal/bridge/bridge.go, line 93 (link)style: Log message says
DiscoverBridges()but function name isConfigureBridgesGrouping()
35 files reviewed, 3 comments
1850188 to
832b8bd
Compare
64d5f6e to
020f09b
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR adds devlink parameter configuration support to the SR-IOV Network Operator, allowing users to configure low-level device parameters for NIC devices.
Key Changes
- Added
DevlinkParamsfield toSriovNetworkNodePolicyAPI with validation for parameter names, values, cmodes (runtime/driverinit/permanent), and apply targets (PF/VF/SF) - Implemented devlink parameter discovery via
GetDevlinkDeviceParams()to read current device parameters and populate node state status - Added
SetDevlinkDeviceParam()with automatic type conversion (uint8/uint16/uint32/string/bool) and CMODE detection - Integrated vendor-specific handling in Mellanox plugin for
esw_multiportparameter requiring firmwareLAG_RESOURCE_ALLOCATIONconfiguration - Parameters applied during PF configuration phase before VF creation, with separate handling for PF-level and VF-level parameters
Critical Issues Found
- Case-sensitivity bug: The
ApplyOnfield allows both uppercase and lowercase values in the API enum, but the implementation only matches uppercase constants, causing lowercase values to be silently ignored - Missing SF support: SubFunction (
SF) is allowed by the API but not implemented in the switch statement - No default case: Invalid
ApplyOnvalues are silently ignored without error reporting
Architecture
The implementation follows the operator's plugin architecture: GenericPlugin applies OS-level devlink parameters via netlink, while vendor plugins (e.g., MellanoxPlugin) handle firmware-level prerequisites before devlink configuration.
Confidence Score: 3/5
- This PR has critical bugs that will cause devlink parameters with lowercase ApplyOn values to be silently ignored
- The case-sensitivity bug in
ApplyOnfield comparison is a serious functional issue that will prevent the feature from working correctly when users provide lowercase values (which are explicitly allowed by the API validation). Additionally, SF (SubFunction) support is missing despite being in the API enum, and there's no error handling for invalid ApplyOn values. - Pay close attention to
pkg/host/internal/sriov/sriov.go- the ApplyOn field comparison logic must be fixed to handle case-insensitive matching and add SF support
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| api/v1/sriovnetworknodepolicy_types.go | 5/5 | Added DevlinkParams field to policy spec with proper validation enums |
| api/v1/sriovnetworknodestate_types.go | 5/5 | Added DevlinkParams fields to Interface, InterfaceExt, and VirtualFunction types for tracking devlink parameters |
| pkg/host/internal/sriov/sriov.go | 3/5 | Implemented devlink parameter discovery and application; case-sensitivity issue with ApplyOn field comparison |
| pkg/host/internal/network/network.go | 5/5 | Implemented GetDevlinkDeviceParams and SetDevlinkDeviceParam with proper type conversion and CMODE handling |
| pkg/vendors/mellanox/mellanox.go | 5/5 | Added vendor-specific handling for esw_multiport devlink parameter requiring firmware configuration |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Daemon as Config Daemon
participant MP as MellanoxPlugin
participant GP as GenericPlugin
participant Sriov as SriovInterface
participant Network as NetworkHelper
participant Netlink as Netlink Library
participant FW as Firmware (mstconfig)
User->>Policy: Create/Update with DevlinkParams
Policy->>Daemon: Policy Applied to Node
Daemon->>MP: OnNodeStateChange()
MP->>MP: Check esw_multiport param
alt esw_multiport detected
MP->>FW: Set LAG_RESOURCE_ALLOCATION=1
MP->>FW: Reset firmware (if needed)
end
MP-->>Daemon: needReboot, needDrain
Daemon->>GP: Apply()
GP->>Sriov: ConfigSriovInterfaces()
Sriov->>Sriov: configSriovPFDevice()
loop for each DevlinkParam
alt ApplyOn == PF
Sriov->>Network: SetDevlinkDeviceParam(pciAddr, name, value)
Network->>Netlink: DevlinkGetDeviceParamByName()
Netlink-->>Network: param metadata (type, cmode)
Network->>Network: Convert value to correct type
Network->>Netlink: DevlinkSetDeviceParam(bus, device, param, cmode, value)
Netlink-->>Network: success
else ApplyOn == VF
Sriov->>Sriov: Get VF list
loop for each VF
Sriov->>Network: SetDevlinkDeviceParam(vfAddr, name, value)
Network->>Netlink: DevlinkSetDeviceParam()
end
end
end
Sriov->>Sriov: Create VFs
Daemon->>Sriov: DiscoverSriovDevices()
Sriov->>Network: GetDevlinkDeviceParams(pciAddr)
Network->>Netlink: DevlinkGetDeviceParams()
Netlink-->>Network: All configured params
Network->>Network: Merge params by name and cmode
Network-->>Sriov: DevlinkParam list
Sriov-->>Daemon: Updated InterfaceExt with DevlinkParams
Daemon->>Policy: Update Status
Additional Comments (3)
The API validation allows both uppercase and lowercase values (
The switch statement lacks a default case for invalid
When Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time! |
832b8bd to
01f2b6c
Compare
020f09b to
f509543
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
Adds devlink parameter configuration support to SR-IOV Network Operator, enabling runtime configuration of device-level parameters like esw_multiport for Mellanox NICs. The implementation extends the API with DevlinkParams fields in policies and state objects, discovers existing parameters via netlink, and applies them through vendor-specific (Mellanox firmware) and generic (OS-level devlink) plugins. Bridge grouping policy configuration is also added to support multi-port NIC management.
Confidence Score: 2/5
- Moderate risk due to error handling gaps that could cause silent failures during devlink parameter configuration
- The PR contains a critical logic error in
applyDevlinkVfParamwhere GetVFList failure is logged but not returned, allowing execution to continue with no VFs configured. Additionally, missing default handling for theApplyOnfield could result in devlink parameters being silently ignored. While the overall architecture is sound and the Mellanox-specific firmware integration appears correct, these error handling issues in core SR-IOV configuration paths pose risk of configuration drift where parameters appear applied but aren't actually set on devices. pkg/host/internal/sriov/sriov.gorequires immediate attention for error handling in VF parameter application;pkg/vendors/mellanox/mellanox.goneeds input validation for devlink parameter values
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/host/internal/sriov/sriov.go | 1/5 | Adds devlink parameter configuration for PF/VF; missing error return after logging error in applyDevlinkVfParam could cause silent failures |
| pkg/vendors/mellanox/mellanox.go | 2/5 | Adds esw_multiport devlink parameter support via firmware LAG_RESOURCE_ALLOCATION configuration; logic seems sound but lacks validation for devlink parameter values |
| api/v1/sriovnetworknodepolicy_types.go | 3/5 | Adds DevlinkParams struct with validation enums for Cmode and ApplyOn fields; also adds GroupingPolicy field to Bridge struct |
| pkg/host/internal/network/network.go | 3/5 | Refactors devlink value parsing into reusable helpers and adds GetDevlinkDeviceParams to retrieve all configured parameters |
Sequence Diagram
sequenceDiagram
participant Policy as SriovNetworkNodePolicy
participant Generic as GenericPlugin
participant Mellanox as MellanoxPlugin
participant SRIOV as SR-IOV Manager
participant Network as Network Helper
participant FW as Firmware (mstconfig)
Policy->>Generic: Apply Policy with DevlinkParams
Generic->>SRIOV: ConfigureSriovDevice()
alt Mellanox NIC with esw_multiport
Mellanox->>Mellanox: HandleESwitchParams()
Mellanox->>FW: Set LAG_RESOURCE_ALLOCATION
Mellanox-->>Generic: Requires Reboot
end
SRIOV->>Network: GetDevlinkDeviceParams() (discover)
Network-->>SRIOV: Current params
loop For each DevlinkParam
alt ApplyOn = PF
SRIOV->>Network: SetDevlinkDeviceParam(PF)
else ApplyOn = VF
SRIOV->>Network: SetDevlinkDeviceParam(each VF)
end
end
Generic->>Generic: ConfigureBridges()
Generic->>Generic: ConfigureBridgesGrouping()
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | ||
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | ||
| if err != nil { | ||
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | ||
| } |
There was a problem hiding this comment.
Missing error return after logging error in applyDevlinkVfParam could cause silent failures. If GetVFList fails, the error is logged but execution continues with a potentially empty vfAddrs slice. This means devlink parameters won't be applied to any VFs, but the function returns success.
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | |
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | |
| if err != nil { | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | |
| } | |
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | |
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | |
| if err != nil { | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | |
| return err | |
| } |
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing default value or validation for param.ApplyOn could lead to devlink parameters being silently ignored. When ApplyOn is empty or has an unexpected value (e.g., "SF"), the parameter won't be applied but no error is returned.
Consider adding a default case or validation to handle empty/invalid ApplyOn values. According to the design doc, PF should be the default.
| func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) { | ||
| log.Log.Info("mellanox-plugin isLagResourceAllocationRequireChange()", "device", iface.PciAddress) | ||
|
|
||
| requsted, found := false, false | ||
| for _, devlinkParam := range iface.DevlinkParams.Params { | ||
| if devlinkParam.Name != devlinkMultiportParamName { | ||
| continue | ||
| } | ||
| requsted = true | ||
|
|
||
| for _, devlinkParamStatus := range ifaceStatus.DevlinkParams.Params { | ||
| if devlinkParamStatus.Name == devlinkMultiportParamName { | ||
| found = true | ||
| if iface.Name == ifaceStatus.Name && devlinkParam.Value != devlinkParamStatus.Value { | ||
| return true, &devlinkParam | ||
| } | ||
| } | ||
| } | ||
| } | ||
| if requsted && !found { | ||
| // esw_multiport devlink parameter not found and not configured | ||
| return true, nil | ||
| } | ||
| return false, nil | ||
| } |
There was a problem hiding this comment.
The function isESwitchParamsRequireChange only checks the esw_multiport parameter but doesn't validate the parameter value. Invalid values (e.g., arbitrary strings) could be passed through to firmware configuration without validation.
Consider adding validation to ensure devlinkParam.Value is a valid boolean string ("true"/"false") before using it to set firmware configuration.
| // GetDevlinkDeviceParams returns all configured devlink parameters | ||
| func (n *network) GetDevlinkDeviceParams(pciAddr string) ([]sriovnetworkv1.DevlinkParam, error) { | ||
| params, err := n.netlinkLib.DevlinkGetDeviceParams(consts.BusPci, pciAddr) | ||
| if err != nil { | ||
| log.Log.Error(err, "Failed to get devlink params", "device", pciAddr) | ||
| return nil, err | ||
| } | ||
|
|
||
| merged := make(map[string]map[uint8]interface{}) | ||
| devlinkParams := make([]sriovnetworkv1.DevlinkParam, 0) | ||
|
|
There was a problem hiding this comment.
In GetDevlinkDeviceParams, the error from devlinkValueToString is captured in the loop but then immediately returned, causing the function to fail on the first unparseable parameter. This could prevent returning valid parameters if one parameter has parsing issues.
However, since the error variable is captured but not actually used before the return statement (line 391-392 show error is assigned but always nil before the check), this appears to be dead code. Consider removing the unused error handling or implementing proper error accumulation.
f509543 to
1b542bf
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
Adds devlink parameter configuration support to SR-IOV Network Operator, enabling users to configure device-level parameters like esw_multiport through the SriovNetworkNodePolicy CRD. Implements vendor-specific firmware configuration for Mellanox NICs to set LAG_RESOURCE_ALLOCATION when multiport is requested, and generic OS-level devlink parameter application for PF and VF devices.
Confidence Score: 1/5
- Critical error handling bugs could cause silent failures or exclude devices from SR-IOV configuration
- Two critical logic errors in error handling: (1) applyDevlinkVfParam logs but doesn't return error when GetVFList fails, causing silent success on failure; (2) DiscoverSriovDevices skips entire PF interface when devlink param discovery fails, potentially breaking SR-IOV on affected devices. Additionally, missing SF (Sub-Function) case means parameters for SF devices are silently ignored despite API validation allowing them.
- pkg/host/internal/sriov/sriov.go requires immediate fixes to error handling in applyDevlinkVfParam (line 470-474) and DiscoverSriovDevices (line 332-338)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/host/internal/sriov/sriov.go | 1/5 | Adds devlink parameter discovery and application; critical error handling bug in applyDevlinkVfParam doesn't return after GetVFList failure |
| pkg/host/internal/network/network.go | 3/5 | Adds GetDevlinkDeviceParams to retrieve all devlink parameters and refactors value conversion into helper methods |
| pkg/vendors/mellanox/mellanox.go | 3/5 | Adds HandleESwitchParams to configure LAG_RESOURCE_ALLOCATION firmware parameter based on esw_multiport devlink parameter |
| api/v1/sriovnetworknodepolicy_types.go | 4/5 | Adds DevlinkParam and DevlinkParams types to API with validation enums for cmode and applyOn fields |
Sequence Diagram
sequenceDiagram
participant Policy as SriovNetworkNodePolicy
participant Controller as Policy Controller
participant State as SriovNetworkNodeState
participant GenericPlugin as Generic Plugin
participant MellanoxPlugin as Mellanox Plugin
participant MlxVendor as Mellanox Vendor Utils
participant SriovLib as SR-IOV Library
participant NetworkLib as Network Library
Policy->>Controller: DevlinkParams specified in policy
Controller->>State: Apply policy to node state
State->>State: Interface.DevlinkParams = Policy.DevlinkParams
Note over MellanoxPlugin,MlxVendor: Firmware Configuration (Vendor-Specific)
MellanoxPlugin->>MlxVendor: HandleESwitchParams()
MlxVendor->>MlxVendor: Check for esw_multiport param
alt esw_multiport=true and requires change
MlxVendor->>MlxVendor: Set Multiport=1 (LAG_RESOURCE_ALLOCATION=1)
MlxVendor-->>MellanoxPlugin: needReboot=true
MellanoxPlugin->>MlxVendor: MlxConfigFW()
MlxVendor->>MlxVendor: Run mstconfig to set LAG_RESOURCE_ALLOCATION
end
Note over GenericPlugin,NetworkLib: OS-Level Devlink Configuration
GenericPlugin->>SriovLib: ConfigureSriovDevices()
SriovLib->>SriovLib: configSriovPFDevice()
loop For each DevlinkParam
alt ApplyOn == "PF"
SriovLib->>NetworkLib: SetDevlinkDeviceParam(PF address)
NetworkLib->>NetworkLib: Convert value to correct type
NetworkLib->>NetworkLib: DevlinkSetDeviceParam()
else ApplyOn == "VF"
SriovLib->>SriovLib: GetVFList(PF address)
loop For each VF
SriovLib->>NetworkLib: SetDevlinkDeviceParam(VF address)
NetworkLib->>NetworkLib: DevlinkSetDeviceParam()
end
end
end
Note over SriovLib,NetworkLib: Discovery Phase
SriovLib->>NetworkLib: GetDevlinkDeviceParams(PCI address)
NetworkLib->>NetworkLib: DevlinkGetDeviceParams()
NetworkLib->>NetworkLib: Convert all params to DevlinkParam structs
NetworkLib-->>SriovLib: []DevlinkParam
SriovLib->>State: Update Interface.DevlinkParams in status
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | ||
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | ||
| if err != nil { | ||
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | ||
| } |
There was a problem hiding this comment.
Error handling bug: when GetVFList fails, the error is logged but not returned, allowing execution to continue with an empty vfAddrs slice. This means the function silently succeeds when it should fail.
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | |
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | |
| if err != nil { | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | |
| } | |
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | |
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | |
| if err != nil { | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | |
| return err | |
| } |
pkg/host/internal/sriov/sriov.go
Outdated
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | ||
| if err != nil { | ||
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | ||
| "device", device) | ||
| continue | ||
| } | ||
| iface.DevlinkParams.Params = devlinkParams |
There was a problem hiding this comment.
The continue statement skips adding the entire interface to the results when devlink parameter discovery fails. This means the PF and all its VFs are excluded from the node state, which could break SR-IOV functionality. Consider logging the error but continuing to add the interface with empty devlink params.
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | |
| if err != nil { | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | |
| "device", device) | |
| continue | |
| } | |
| iface.DevlinkParams.Params = devlinkParams | |
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | |
| if err != nil { | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable to read devlink parameters, continuing without them", | |
| "device", device) | |
| } else { | |
| iface.DevlinkParams.Params = devlinkParams | |
| } |
pkg/vendors/mellanox/mellanox.go
Outdated
| } | ||
|
|
||
| func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) { | ||
| log.Log.Info("mellanox-plugin isLagResourceAllocationRequireChange()", "device", iface.PciAddress) |
There was a problem hiding this comment.
The function name in the log message is incorrect—it says isLagResourceAllocationRequireChange() but the actual function name is isESwitchParamsRequireChange(). This will make debugging harder when reading logs.
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing case for consts.DevlinkParamApplyOnSf (SF devices). The API defines three ApplyOn values (PF, VF, SF) per the validation enum in sriovnetworknodepolicy_types.go, but only PF and VF cases are handled. Parameters intended for SF devices will be silently ignored.
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Empty ApplyOn field handling: when param.ApplyOn is empty or contains an invalid value, the parameter is silently skipped without any log message. Consider adding a default case that logs unhandled values or treats empty ApplyOn as PF (since the API comment says "PF (default)").
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
There was a problem hiding this comment.
An empty value can't be passed via API
|
here's it: |
1b542bf to
c46becb
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR adds devlink parameter configuration support to the SR-IOV Network Operator, enabling low-level device parameter management for NICs. The implementation follows a two-phase approach: vendor-specific firmware configuration (Mellanox plugin) followed by OS-level devlink parameter application (Generic plugin).
Key Changes:
- Added
DevlinkParamsAPI toSriovNetworkNodePolicyandSriovNetworkNodeStatewith proper kubebuilder validation forCmode(runtime/driverinit/permanent) andApplyOn(PF/VF/SF) fields - Implemented devlink parameter discovery via
GetDevlinkDeviceParams()to read current device state - Added
SetDevlinkDeviceParam()application logic for PF and VF devices in the SR-IOV library - Mellanox plugin handles
esw_multiportparameter by settingLAG_RESOURCE_ALLOCATIONfirmware flag and triggering reboot - Bridge grouping feature added with
GroupingPolicyfield supporting "all" policy to group multiple interfaces on a single OVS bridge - Comprehensive test coverage for Mellanox eswitch parameter handling (10 test cases)
Issues Found:
- Critical logic error in
network.go:429-432whereerris checked after type assertion but never assigned from conversion - Error silently discarded at
network.go:412indevlinkValueToString()call could mask conversion failures - ApplyOn field handling in
sriov.go:446-461only matches exact constant values (PF/VF), won't handle empty strings or lowercase variants - Variable name typo
requstedinstead ofrequestedinmellanox.go(3 occurrences) - Incorrect function name in log message at
bridge.go:93
Confidence Score: 3/5
- This PR has critical logic errors that need fixing before merge, particularly error handling in devlink parameter conversion
- Score reflects solid architecture and test coverage but critical bugs in error handling (unused error check, silently discarded errors) and potential runtime issues with ApplyOn field matching that could cause devlink parameters to not be applied correctly
- Pay close attention to
pkg/host/internal/network/network.go(error handling bugs),pkg/host/internal/sriov/sriov.go(ApplyOn field logic), andpkg/vendors/mellanox/mellanox.go(typo needs fixing)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| api/v1/sriovnetworknodepolicy_types.go | 5/5 | Added DevlinkParam and DevlinkParams types to API with proper kubebuilder validation for Cmode and ApplyOn fields |
| api/v1/sriovnetworknodestate_types.go | 5/5 | Extended Interface, InterfaceExt, VirtualFunction, and Bridges types with DevlinkParams and GroupingPolicy fields |
| pkg/host/internal/network/network.go | 2/5 | Added GetDevlinkDeviceParams with helper functions; has unused error check at line 430 and silently discarded error at line 412 |
| pkg/host/internal/sriov/sriov.go | 3/5 | Implemented devlink parameter discovery and application for PF/VF; ApplyOn field matching may not handle default/lowercase values |
| pkg/vendors/mellanox/mellanox.go | 3/5 | Added esw_multiport handling with LAG_RESOURCE_ALLOCATION firmware configuration; contains typo 'requsted' instead of 'requested' |
| pkg/host/internal/bridge/bridge.go | 4/5 | Added ConfigureBridgesGrouping for 'all' policy to group interfaces on single OVS bridge; has incorrect function name in log |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Helper as API Helper
participant State as SriovNetworkNodeState
participant GenericPlugin as Generic Plugin
participant MellanoxPlugin as Mellanox Plugin
participant MellanoxVendor as Mellanox Vendor
participant SriovLib as SR-IOV Library
participant NetworkLib as Network Library
participant DevlinkAPI as Devlink API
User->>Policy: Create/Update Policy with DevlinkParams
Policy->>Helper: Apply() policy to state
Helper->>State: Add DevlinkParams to Interface spec
Note over MellanoxPlugin: Vendor-specific preprocessing
State->>MellanoxPlugin: OnNodeStateChange()
MellanoxPlugin->>MellanoxVendor: HandleESwitchParams()
MellanoxVendor->>MellanoxVendor: Check if esw_multiport requested
alt esw_multiport parameter changed
MellanoxVendor->>MellanoxVendor: Set Multiport=1/0 for LAG_RESOURCE_ALLOCATION
MellanoxVendor->>MellanoxVendor: MlxConfigFW() - Configure firmware
MellanoxVendor-->>MellanoxPlugin: needReboot=true
end
Note over GenericPlugin: OS-level configuration
State->>GenericPlugin: Apply()
GenericPlugin->>SriovLib: ConfigureSriovDevices()
SriovLib->>SriovLib: configSriovPFDevice()
loop For each DevlinkParam
alt ApplyOn == "PF"
SriovLib->>NetworkLib: SetDevlinkDeviceParam(PF)
NetworkLib->>DevlinkAPI: Set parameter on PF
else ApplyOn == "VF"
SriovLib->>SriovLib: Get VF list
loop For each VF
SriovLib->>NetworkLib: SetDevlinkDeviceParam(VF)
NetworkLib->>DevlinkAPI: Set parameter on VF
end
end
end
Note over SriovLib: Device discovery
SriovLib->>NetworkLib: GetDevlinkDeviceParams()
NetworkLib->>DevlinkAPI: DevlinkGetDeviceParams()
DevlinkAPI-->>NetworkLib: Return params with cmodes
NetworkLib->>NetworkLib: Convert to DevlinkParam structs
NetworkLib-->>SriovLib: Return devlink params
SriovLib->>State: Update status with DevlinkParams
Note over GenericPlugin: Bridge grouping (optional)
alt GroupingPolicy == "all"
GenericPlugin->>GenericPlugin: ConfigureBridgesGrouping()
GenericPlugin->>GenericPlugin: Create OVS bridge with first interface
loop For remaining interfaces
GenericPlugin->>GenericPlugin: AddInterfaceToOVSBridge()
end
end
| strVal := val.(string) | ||
| if err != nil { | ||
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | ||
| return nil, err |
There was a problem hiding this comment.
err is checked after type assertion at line 429, but it was never assigned an error value from the type assertion
| strVal := val.(string) | |
| if err != nil { | |
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | |
| return nil, err | |
| strVal := val.(string) |
|
|
||
| for _, v := range p.Values { | ||
| // only CLI-visible cmodes | ||
| strVal, _ := n.devlinkValueToString(p.Type, v.Data) |
There was a problem hiding this comment.
error from devlinkValueToString is silently discarded, could mask conversion failures
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
ApplyOn field defaults to empty string but only handles exact matches to PF/VF constants; should handle empty/lowercase values or set default
| func (b *bridge) ConfigureBridgesGrouping(interfaces sriovnetworkv1.Interfaces, name, groupingPolicy string) error { | ||
| if groupingPolicy != "" { | ||
| if groupingPolicy != constants.OvsGroupingPolicyAll { | ||
| log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy) |
There was a problem hiding this comment.
log message says "DiscoverBridges()" but function is ConfigureBridgesGrouping()
| log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy) | |
| log.Log.V(2).Info("ConfigureBridgesGrouping(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy) |
| func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) { | ||
| log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress) | ||
|
|
||
| requsted, found := false, false |
There was a problem hiding this comment.
requsted is mis-spelled
| requsted, found := false, false | |
| requested, found := false, false |
| if devlinkParam.Name != devlinkMultiportParamName { | ||
| continue | ||
| } | ||
| requsted = true |
There was a problem hiding this comment.
update variable reference after fixing typo
| requsted = true | |
| requested = true |
| } | ||
| } | ||
| } | ||
| if requsted && !found { |
There was a problem hiding this comment.
update variable reference after fixing typo
| if requsted && !found { | |
| if requested && !found { |
cf09545 to
217db27
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR adds devlink parameter configuration support to the SR-IOV Network Operator, enabling users to configure generic and vendor-specific devlink parameters for network devices. The implementation includes API extensions, discovery mechanisms, and vendor-specific firmware handling for Mellanox NICs.
Key Changes:
- Added
DevlinkParamsAPI toSriovNetworkNodePolicyandSriovNetworkNodeStatewith support for PF, VF, and SF devices - Implemented devlink parameter discovery and application in the SRIOV library
- Added Mellanox-specific
esw_multiporthandling withLAG_RESOURCE_ALLOCATIONfirmware configuration - Extended bridge configuration with grouping policy support (
perPF,perNIC,all) - Comprehensive test coverage for new functionality
Critical Issues Found:
- Bug in
pkg/host/internal/network/network.go:429-433: Unreachable error check that will never execute (checkederrvariable that's always nil) - Test failures in
pkg/vendors/mellanox/mellanox_test.go:310,321: MissingMultiport: 0field causing tests to fail - Incomplete implementation in
pkg/host/internal/sriov/sriov.go:447-460: Missing SF (Scalable Function) case handling and case-insensitive ApplyOn comparison
Minor Issues:
- Typo in constant name:
OvsGroupingPolicyPerVFshould beOvsGroupingPolicyPerNIC - Several grammar typos in error messages
- Bridge grouping only supports "all" policy, silently ignores others
The design is sound and follows the operator's architecture well, but the critical bugs need to be fixed before merging.
Confidence Score: 3/5
- This PR has critical bugs that will cause test failures and runtime issues with devlink parameter handling.
- Score reflects three critical bugs: unreachable error check in network.go, test failures in mellanox_test.go, and incomplete SF case handling in sriov.go. While the design is solid and test coverage is good, these issues will prevent proper functionality and cause test failures.
- Pay close attention to
pkg/host/internal/network/network.go(unreachable error),pkg/vendors/mellanox/mellanox_test.go(failing tests), andpkg/host/internal/sriov/sriov.go(missing SF handling)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/host/internal/network/network.go | 3/5 | Added devlink parameter retrieval with refactoring. Contains critical bug with unreachable error check on line 429-433. |
| pkg/vendors/mellanox/mellanox_test.go | 3/5 | Added comprehensive tests for eswitch params. Tests on lines 310 and 321 will fail due to missing Multiport field. |
| pkg/host/internal/sriov/sriov.go | 3/5 | Added devlink param discovery and application logic. Missing SF case handling and case-insensitive ApplyOn comparison. |
| pkg/vendors/mellanox/mellanox.go | 4/5 | Added esw_multiport handling with LAG_RESOURCE_ALLOCATION firmware config. Logic looks correct for Mellanox-specific requirements. |
| api/v1/sriovnetworknodepolicy_types.go | 5/5 | Added DevlinkParam and DevlinkParams structs with proper kubebuilder validation tags. Clean API design. |
| api/v1/sriovnetworknodestate_types.go | 5/5 | Added DevlinkParams to Interface, InterfaceExt, VirtualFunction, and Bridges. Consistent with policy types. |
| pkg/host/internal/bridge/bridge.go | 4/5 | Added ConfigureBridgesGrouping for "all" policy. Only supports "all" grouping, logs and skips other values. |
| pkg/consts/constants.go | 4/5 | Added devlink ApplyOn and OVS grouping constants. Contains typo: OvsGroupingPolicyPerVF should be PerNIC. |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Helper as Policy Helper
participant State as SriovNetworkNodeState
participant MellanoxPlugin as Mellanox Plugin
participant GenericPlugin as Generic Plugin
participant SriovLib as SRIOV Library
participant NetworkHelper as Network Helper
participant Netlink as Netlink Library
User->>Policy: Create/Update with DevlinkParams
Policy->>Helper: Apply()
Helper->>State: Update spec.interfaces[].devlinkParams
Note over State,MellanoxPlugin: Vendor-specific firmware config (if needed)
State->>MellanoxPlugin: OnNodeStateChange()
MellanoxPlugin->>MellanoxPlugin: HandleESwitchParams()
MellanoxPlugin->>MellanoxPlugin: Check esw_multiport param
alt esw_multiport needs change
MellanoxPlugin->>MellanoxPlugin: Set LAG_RESOURCE_ALLOCATION firmware flag
MellanoxPlugin-->>State: needReboot = true
end
Note over State,GenericPlugin: OS-level devlink config
State->>GenericPlugin: Apply()
GenericPlugin->>SriovLib: ConfigureSriovDevices()
SriovLib->>SriovLib: configSriovPFDevice()
loop For each DevlinkParam
alt ApplyOn == PF
SriovLib->>NetworkHelper: SetDevlinkDeviceParam(pciAddr, name, value)
NetworkHelper->>Netlink: DevlinkSetDeviceParam()
else ApplyOn == VF
SriovLib->>SriovLib: Get VF list
loop For each VF
SriovLib->>NetworkHelper: SetDevlinkDeviceParam(vfAddr, name, value)
NetworkHelper->>Netlink: DevlinkSetDeviceParam()
end
end
end
Note over SriovLib,Netlink: Discovery phase
SriovLib->>NetworkHelper: GetDevlinkDeviceParams(pciAddr)
NetworkHelper->>Netlink: DevlinkGetDeviceParams()
Netlink-->>NetworkHelper: Return all params
NetworkHelper->>NetworkHelper: Convert to DevlinkParam structs
NetworkHelper-->>SriovLib: Return DevlinkParam[]
SriovLib-->>State: Update status.interfaces[].devlinkParams
| strVal := val.(string) | ||
| if err != nil { | ||
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
unreachable error check - err is always nil at this point
The error check is checking err after type assertion, but err was never assigned from the type assertion. The error from line 412 is intentionally ignored with _. This check will never be true.
| strVal := val.(string) | |
| if err != nil { | |
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | |
| return nil, err | |
| } | |
| strVal := val.(string) | |
| devlinkParam := sriovnetworkv1.DevlinkParam{ |
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return( | ||
| "", | ||
| "", nil) | ||
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}}) |
There was a problem hiding this comment.
missing Multiport: 0 field in test struct
Test expects LAG_RESOURCE_ALLOCATION=0 in line 307, but doesn't set Multiport: 0 in the struct. This will cause the test to fail because line 228 in mellanox.go only appends the parameter when Multiport != -1.
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}}) | |
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH", Multiport: 0}}) |
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test", "LAG_RESOURCE_ALLOCATION=0").Return( | ||
| "", | ||
| "", testError) | ||
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}}) |
There was a problem hiding this comment.
missing Multiport: 0 field in test struct
Test expects LAG_RESOURCE_ALLOCATION=0 in line 318, but doesn't set Multiport: 0 in the struct. This will cause the test to fail.
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}}) | |
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test", Multiport: 0}}) |
| } | ||
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | ||
| if err != nil { | ||
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", |
There was a problem hiding this comment.
typo in error message: "unable devlink parameters" should be "unable to get devlink parameters"
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable to get devlink parameters, skipping", |
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
missing handling for SF (Scalable Function) ApplyOn case
The API defines SF as a valid value for ApplyOn (line 69 in sriovnetworknodepolicy_types.go), but the switch statement only handles PF and VF cases. SF parameters will be silently ignored.
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
missing default handling and case-insensitive comparison for ApplyOn
The constants are uppercase (PF, VF, SF) but the API allows lowercase variants (e.g., pf, vf, sf per kubebuilder validation). Should normalize to uppercase or use case-insensitive comparison.
| DevlinkParamApplyOnPf = "PF" | ||
| DevlinkParamApplyOnVf = "VF" | ||
| DevlinkParamApplyOnSf = "SF" | ||
|
|
There was a problem hiding this comment.
typo in constant name: should be OvsGroupingPolicyPerNIC not OvsGroupingPolicyPerVF
The design doc and API use perNIC (line 87 in sriovnetworknodepolicy_types.go), not perVF.
| } | ||
| bridge, err := o.getBridgeByName(ctx, dbClient, bridgeName) | ||
| if err != nil { | ||
| funcLog.Error(err, "AddInterfaceToOVSBridge(): failed to get bridge", "bridge", bridgeName) |
There was a problem hiding this comment.
typo in error message: "connect to get bridge" should be "get bridge"
| funcLog.Error(err, "AddInterfaceToOVSBridge(): failed to get bridge", "bridge", bridgeName) | |
| return fmt.Errorf("failed to get bridge: %v", err) |
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR implements devlink parameter configuration support for SR-IOV Network Operator, allowing users to configure low-level device parameters via the API. The implementation follows a two-phase approach: vendor-specific firmware configuration (e.g., Mellanox esw_multiport via LAG_RESOURCE_ALLOCATION firmware flag) followed by OS-level devlink parameter application via the generic plugin.
Major changes:
- Extended
SriovNetworkNodePolicyandSriovNetworkNodeStateAPIs withDevlinkParamsfield containing parameter name, value, cmode, and applyOn (PF/VF/SF) - Added devlink parameter discovery in
DiscoverSriovDevicesto populate status - Implemented
applyDevlinkPfParamandapplyDevlinkVfParamto configure parameters at OS level - Added Mellanox-specific
HandleESwitchParamsfunction to configure firmware before OS-level changes - Refactored devlink value parsing into reusable helper functions
- Added bridge grouping policy support (currently only
allpolicy implemented)
Issues found:
- Critical: Error handling mismatch in
sriov.go- logs "skipping device" on devlink error but continues processing, while test expects device to be skipped entirely - Bug: Constant name typo
OvsGroupingPolicyPerVFshould beOvsGroupingPolicyPerNICper API enum validation - Maintenance: Incorrect function name in log message (
DiscoverBridgesinstead ofConfigureBridgesGrouping) - Missing validation for empty/invalid
ApplyOnvalues - will silently skip param application - Unclear logic when
esw_multiportparam not found - defaults to enabling multiport without explanation
Confidence Score: 3/5
- PR has solid design but contains critical error handling bug that will cause test failure
- Score reflects critical logic error where sriov discovery logs "skipping device" on error but continues processing (line 334), contradicting test expectations that device should be completely skipped. Also includes constant naming bug that doesn't match CRD enum validation. Good test coverage and clear architecture, but these bugs need fixing before merge.
- Pay close attention to
pkg/host/internal/sriov/sriov.go(error handling logic),pkg/consts/constants.go(naming mismatch), andpkg/vendors/mellanox/mellanox.go(unclear default behavior)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/host/internal/sriov/sriov.go | 2/5 | added devlink discovery and configuration, but error handling inconsistent with test expectations (logs "skipping" but continues) |
| pkg/host/internal/sriov/sriov_test.go | 3/5 | added tests for devlink params, but test expects device skip on error while implementation just logs and continues |
| pkg/vendors/mellanox/mellanox.go | 3/5 | added esw_multiport handling via firmware LAG_RESOURCE_ALLOCATION, unclear default behavior when param not found |
| pkg/host/internal/network/network.go | 4/5 | refactored devlink value parsing into helper functions, added GetDevlinkDeviceParams for batch retrieval |
| pkg/host/internal/bridge/bridge.go | 3/5 | added bridge grouping configuration, but only supports all policy (others silently skipped), incorrect function name in log |
| pkg/consts/constants.go | 2/5 | added devlink and OVS constants, but OvsGroupingPolicyPerVF doesn't match enum (perNIC expected) |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Generic as GenericPlugin
participant Mellanox as MellanoxPlugin
participant SRIOV as SRIOV Discovery
participant Network as NetworkHelper
participant Vendor as MellanoxVendor
User->>Policy: Create/Update with DevlinkParams
Policy->>Policy: Apply policy to NodeState spec
Note over Mellanox,Vendor: Vendor-specific firmware configuration
Mellanox->>Vendor: OnNodeStateChange
Vendor->>Vendor: HandleESwitchParams(esw_multiport)
alt esw_multiport change required
Vendor->>Vendor: Set LAG_RESOURCE_ALLOCATION firmware flag
Vendor-->>Mellanox: needReboot=true
end
Vendor->>Vendor: HandleLinkType
Vendor->>Vendor: MlxConfigFW (mstconfig command)
Note over Generic,Network: OS-level devlink configuration
Generic->>SRIOV: ConfigureSriovDevice
SRIOV->>SRIOV: configSriovPFDevice
loop for each DevlinkParam
alt ApplyOn == PF
SRIOV->>Network: SetDevlinkDeviceParam(PF)
else ApplyOn == VF
SRIOV->>Network: SetDevlinkDeviceParam(each VF)
end
end
Note over SRIOV,Network: Device discovery and status update
SRIOV->>Network: DiscoverSriovDevices
Network->>Network: GetDevlinkDeviceParams
Network-->>SRIOV: DevlinkParams list
SRIOV-->>Policy: Update NodeState status
Note over Generic: Bridge grouping (if configured)
Generic->>Generic: ConfigureBridgesGrouping
alt GroupingPolicy == "all"
Generic->>Generic: CreateOVSBridge with all interfaces
end
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | ||
| if err != nil { | ||
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | ||
| "device", device) | ||
| } else { | ||
| iface.DevlinkParams.Params = devlinkParams | ||
| } |
There was a problem hiding this comment.
log message says "skipping" but device continues to be added to pfList
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | |
| if err != nil { | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | |
| "device", device) | |
| } else { | |
| iface.DevlinkParams.Params = devlinkParams | |
| } | |
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | |
| if err != nil { | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable to get devlink parameters, skipping device", | |
| "device", device) | |
| continue | |
| } | |
| iface.DevlinkParams.Params = devlinkParams |
| func (b *bridge) ConfigureBridgesGrouping(interfaces sriovnetworkv1.Interfaces, name, groupingPolicy string) error { | ||
| if groupingPolicy != "" { | ||
| if groupingPolicy != constants.OvsGroupingPolicyAll { | ||
| log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy) |
There was a problem hiding this comment.
function name in log message is DiscoverBridges() but this function is ConfigureBridgesGrouping()
| log.Log.V(2).Info("DiscoverBridges(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy) | |
| log.Log.V(2).Info("ConfigureBridgesGrouping(): Not supported bridge group policy, skipping", "GroupingPolicy", groupingPolicy) |
| } else { | ||
| attr.Multiport = 1 | ||
| } |
There was a problem hiding this comment.
when devlinkParam is nil, Multiport is set to 1 (enable), but the comment on line 462 says "esw_multiport devlink parameter not found and not configured" - unclear why enabling when not found
Consider explicitly handling this case or adding explanation
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
|
|
||
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
no validation for empty ApplyOn field - will silently skip devlink param application if not set to PF or VF
Consider adding a default value or explicit warning when ApplyOn is unrecognized
|
|
||
| OvsGroupingPolicyPerPF = "perPF" | ||
| OvsGroupingPolicyPerVF = "perVF" |
There was a problem hiding this comment.
typo: should be perNIC not perVF based on API enum validation in sriovnetworknodepolicy_types.go (line 90: perPF;perNIC;all)
| OvsGroupingPolicyPerPF = "perPF" | |
| OvsGroupingPolicyPerVF = "perVF" | |
| OvsGroupingPolicyPerPF = "perPF" | |
| OvsGroupingPolicyPerNIC = "perNIC" | |
| OvsGroupingPolicyAll = "all" |
| func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) { | ||
| log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress) | ||
|
|
||
| requsted, found := false, false | ||
| for _, devlinkParam := range iface.DevlinkParams.Params { | ||
| if devlinkParam.Name != devlinkMultiportParamName { | ||
| continue | ||
| } | ||
| requsted = true | ||
|
|
||
| for _, devlinkParamStatus := range ifaceStatus.DevlinkParams.Params { | ||
| if devlinkParamStatus.Name == devlinkMultiportParamName { | ||
| found = true | ||
| if iface.Name == ifaceStatus.Name && devlinkParam.Value != devlinkParamStatus.Value { | ||
| return true, &devlinkParam | ||
| } | ||
| } | ||
| } | ||
| } | ||
| if requsted && !found { | ||
| // esw_multiport devlink parameter not found and not configured | ||
| return true, nil | ||
| } | ||
| return false, nil |
There was a problem hiding this comment.
function returns early when multiport param already matches desired state, but doesn't validate that both iface.Name and ifaceStatus.Name match before comparing values - name mismatch could cause false positives
217db27 to
0ade45a
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR adds devlink parameters configuration support to the SR-IOV Network Operator, enabling runtime configuration of device-level parameters for network interfaces. It introduces new API types (DevlinkParam, DevlinkParams) with validation for parameter names, values, cmodes (runtime/driverinit/permanent), and application targets (PF/VF/SF). The implementation includes discovery of existing devlink parameters, application during device configuration, and vendor-specific handling for Mellanox NICs (specifically esw_multiport via LAG_RESOURCE_ALLOCATION firmware setting).
Key Changes:
- Extended
SriovNetworkNodePolicyandSriovNetworkNodeStateAPIs withDevlinkParamsfields - Added
GroupingPolicyfield toBridgeconfiguration for OVS bridge grouping - Implemented devlink parameter discovery in
DiscoverSriovDevices() - Added configuration logic in
configSriovPFDevice()to apply params to PF/VF based onApplyOnfield - Mellanox plugin handles
esw_multiportparameter requiring firmware configuration and reboot - Comprehensive test coverage for new functionality
Issues Found:
- Missing default case for empty/invalid
ApplyOnvalues - will silently skip configuration - Discovery error handling skips entire device instead of continuing with empty params
- Test cases missing
Multiportfield initialization causing mismatch with expected mstconfig commands - Error handling in bridge grouping loop masks earlier failures
- Case-insensitive API validation (
PF/pf) not handled in implementation (only uppercase checked)
Confidence Score: 3/5
- PR has functional issues that need fixing before merge - missing error handling and default cases
- Score reflects several logic bugs that will cause runtime issues: missing default case in ApplyOn switch causes silent failures, discovery error removes entire device from state, test mismatches indicate potential runtime failures, and bridge loop error handling masks failures. Good test coverage and design, but implementation gaps need resolution.
- Pay close attention to
pkg/host/internal/sriov/sriov.go(ApplyOn handling and discovery error),pkg/vendors/mellanox/mellanox_test.go(test initialization), andpkg/host/internal/bridge/bridge.go(error handling)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| api/v1/sriovnetworknodepolicy_types.go | 3/5 | Added DevlinkParam and DevlinkParams types with validation enums; ApplyOn case handling issue |
| pkg/host/internal/sriov/sriov.go | 2/5 | Added devlink param discovery and configuration; missing default case, discovery error handling issue |
| pkg/host/internal/network/network.go | 4/5 | Implemented GetDevlinkDeviceParams to retrieve all devlink params; refactored value conversion |
| pkg/vendors/mellanox/mellanox.go | 3/5 | Added HandleESwitchParams for esw_multiport configuration via LAG_RESOURCE_ALLOCATION; interface name check concern |
| pkg/vendors/mellanox/mellanox_test.go | 2/5 | Added comprehensive tests for HandleESwitchParams; two existing tests missing Multiport initialization |
| pkg/host/internal/bridge/bridge.go | 3/5 | Implemented ConfigureBridgesGrouping for OVS bridge grouping; error handling issue in loop |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Helper as API Helper
participant GenericPlugin as Generic Plugin
participant MellanoxPlugin as Mellanox Plugin
participant SRIOV as SRIOV Host
participant Network as Network Helper
participant Mellanox as Mellanox Vendor
participant Bridge as Bridge Manager
User->>Policy: Create/Update with DevlinkParams
Policy->>Helper: Apply policy to node state
Helper->>Helper: Copy DevlinkParams to Interface
Note over GenericPlugin,MellanoxPlugin: Plugin Processing
MellanoxPlugin->>Mellanox: HandleESwitchParams
Mellanox->>Mellanox: Check esw_multiport param
alt esw_multiport requires change
Mellanox->>Mellanox: Set Multiport=1/0 in MlxNic
Mellanox-->>MellanoxPlugin: needReboot=true
else no change needed
Mellanox-->>MellanoxPlugin: needReboot=false
end
MellanoxPlugin->>Mellanox: MlxConfigFW
Mellanox->>Mellanox: Set LAG_RESOURCE_ALLOCATION
GenericPlugin->>SRIOV: configSriovPFDevice
loop for each DevlinkParam
alt ApplyOn == PF
SRIOV->>Network: SetDevlinkDeviceParam(PCI, name, value)
else ApplyOn == VF
SRIOV->>SRIOV: GetVFList
loop for each VF
SRIOV->>Network: SetDevlinkDeviceParam(VF_PCI, name, value)
end
end
end
GenericPlugin->>Bridge: ConfigureBridgesGrouping
alt groupingPolicy == "all"
Bridge->>Bridge: CreateOVSBridge with first interface
loop for remaining interfaces
Bridge->>Bridge: AddInterfaceToOVSBridge
end
end
Note over SRIOV,Network: Discovery Phase
SRIOV->>Network: GetDevlinkDeviceParams(PCI)
Network->>Network: DevlinkGetDeviceParams
Network->>Network: Convert to DevlinkParam structs
Network-->>SRIOV: []DevlinkParam
SRIOV->>SRIOV: Add to InterfaceExt.DevlinkParams
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
no default case to handle empty ApplyOn or unexpected values
| for _, param := range iface.DevlinkParams.Params { | |
| switch param.ApplyOn { | |
| case consts.DevlinkParamApplyOnPf: | |
| err = s.applyDevlinkPfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| case consts.DevlinkParamApplyOnVf: | |
| err = s.applyDevlinkVfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| } | |
| for _, param := range iface.DevlinkParams.Params { | |
| applyOn := param.ApplyOn | |
| if applyOn == "" { | |
| applyOn = consts.DevlinkParamApplyOnPf | |
| } | |
| switch applyOn { | |
| case consts.DevlinkParamApplyOnPf, "pf": | |
| err = s.applyDevlinkPfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| case consts.DevlinkParamApplyOnVf, "vf": | |
| err = s.applyDevlinkVfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| default: | |
| log.Log.V(2).Info("configSriovPFDevice(): skipping unsupported ApplyOn value", "device", iface.PciAddress, "applyOn", param.ApplyOn) | |
| } | |
| } |
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | ||
| if err != nil { | ||
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | ||
| "device", device) | ||
| } else { | ||
| iface.DevlinkParams.Params = devlinkParams | ||
| } |
There was a problem hiding this comment.
device skipped silently on devlink error - discovery continues but device excluded from pfList, causing inconsistent state reporting
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return( | ||
| "", | ||
| "", nil) | ||
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}}) |
There was a problem hiding this comment.
test expects LAG_RESOURCE_ALLOCATION=0 but Multiport field not set in struct initialization
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH"}}) | |
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH", Multiport: 0}}) |
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test", "LAG_RESOURCE_ALLOCATION=0").Return( | ||
| "", | ||
| "", testError) | ||
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}}) |
There was a problem hiding this comment.
test expects LAG_RESOURCE_ALLOCATION=0 but Multiport field not set in struct initialization
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test"}}) | |
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test", Multiport: 0}}) |
| for _, iface := range interfaces[1:] { | ||
| err := b.ovs.AddInterfaceToOVSBridge(context.Background(), brName, iface.Name) | ||
| if err != nil { | ||
| log.Log.Error(err, "ConfigureBridgesGrouping(): failed to add interface to OVS bridge", "name", bridgeConf.Name, "iface", iface.Name) | ||
| return err | ||
| } | ||
| } |
There was a problem hiding this comment.
loop reuses err variable - will only report error from last interface addition, masking earlier failures
| Cmode string `json:"cmode,omitempty"` | ||
| // Device to apply devlink parameter: PF (default)|VF|SF | ||
| // +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf | ||
| ApplyOn string `json:"applyOn,omitempty"` |
There was a problem hiding this comment.
API allows case-insensitive values (PF/pf, VF/vf, SF/sf) but implementation only checks uppercase constants
| for _, devlinkParamStatus := range ifaceStatus.DevlinkParams.Params { | ||
| if devlinkParamStatus.Name == devlinkMultiportParamName { | ||
| found = true | ||
| if iface.Name == ifaceStatus.Name && devlinkParam.Value != devlinkParamStatus.Value { |
There was a problem hiding this comment.
interface name check (iface.Name == ifaceStatus.Name) may prevent detection if interface renamed - PCI address comparison more reliable
10827fa to
e7ff81a
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR implements devlink parameter configuration support for SR-IOV devices, allowing users to configure low-level device parameters at both the PF and VF level. The implementation includes API extensions, discovery logic, application logic, and vendor-specific support for Mellanox NICs.
Major Changes:
- Added
DevlinkParamsAPI toSriovNetworkNodePolicywith support for parameter name, value, cmode (runtime/driverinit/permanent), andApplyOntarget (PF/VF/SF) - Extended
SriovNetworkNodeStateto track devlink parameters in both spec and status forInterface,InterfaceExt, andVirtualFunctiontypes - Implemented devlink parameter discovery in
DiscoverSriovDevices()to query and report current device configuration - Added parameter application logic with separate handlers for PF and VF devices
- Implemented Mellanox-specific support for
esw_multiportparameter requiring firmware configuration viaLAG_RESOURCE_ALLOCATION - Added OVS bridge grouping policy support with initial implementation for "all" policy
Issues Found:
- Device discovery continues when
GetDevlinkDeviceParams()fails, leading to incomplete device state rather than skipping the device - Missing implementation for SF (Scalable Function) target despite API validation allowing it
- Cmode validation permits all modes (runtime/driverinit/permanent) but design doc states "runtime is only supported now"
- Case-sensitivity mismatch: API allows both uppercase/lowercase
ApplyOnvalues but code only handles uppercase - Grouping policy constants inconsistency between API validation enum and
constspackage - Several typos in variable names and incomplete log messages
- Grouping policy validation only supports "all" but API allows "perPF" and "perNIC"
Confidence Score: 3/5
- This PR requires fixes before merging due to logical issues in error handling and incomplete implementation
- Score reflects several critical issues: device discovery error handling that leads to incomplete state, missing SF implementation despite API support, case-sensitivity bugs, and validation/implementation mismatches for cmode and grouping policy. The feature adds significant complexity with vendor-specific firmware configuration. However, comprehensive test coverage and good architectural separation partially offset these concerns.
- Focus on
pkg/host/internal/sriov/sriov.gofor error handling and SF support,api/v1/sriovnetworknodepolicy_types.gofor validation alignment, andpkg/host/internal/bridge/bridge.gofor grouping policy implementation
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| api/v1/sriovnetworknodepolicy_types.go | 5/5 | added DevlinkParams API with proper kubebuilder validation tags for devlink parameter configuration |
| pkg/host/internal/sriov/sriov.go | 3/5 | added devlink parameter discovery and application logic for PF and VF devices with error handling issue |
| pkg/host/internal/network/network.go | 4/5 | implemented GetDevlinkDeviceParams() to retrieve all devlink parameters and refactored value parsing into helper functions |
| pkg/vendors/mellanox/mellanox.go | 4/5 | added support for esw_multiport devlink parameter requiring firmware configuration via LAG_RESOURCE_ALLOCATION |
| pkg/host/internal/bridge/bridge.go | 4/5 | implemented ConfigureBridgesGrouping() to create OVS bridges with "all" grouping policy |
| pkg/consts/constants.go | 5/5 | added constants for devlink parameter ApplyOn values and OVS grouping policies |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Daemon as Config Daemon
participant MellanoxPlugin as Mellanox Plugin
participant GenericPlugin as Generic Plugin
participant SRIOV as SRIOV Manager
participant Network as Network Helper
participant Devlink as Devlink (Kernel)
participant FW as Firmware (mstconfig)
User->>Policy: Create policy with DevlinkParams
Policy->>Daemon: Apply policy to node
Daemon->>MellanoxPlugin: OnNodeStateChange()
Note over MellanoxPlugin: Check firmware prerequisites
MellanoxPlugin->>MellanoxPlugin: HandleESwitchParams()
alt esw_multiport parameter requested
MellanoxPlugin->>MellanoxPlugin: Check if LAG_RESOURCE_ALLOCATION needed
MellanoxPlugin->>FW: Configure LAG_RESOURCE_ALLOCATION via mstconfig
MellanoxPlugin-->>Daemon: needReboot = true
end
Daemon->>GenericPlugin: Apply()
GenericPlugin->>SRIOV: configSriovPFDevice()
Note over SRIOV: Apply devlink parameters to devices
loop For each DevlinkParam
alt ApplyOn = PF
SRIOV->>Network: SetDevlinkDeviceParam(PF)
Network->>Devlink: Set parameter on PF device
else ApplyOn = VF
SRIOV->>SRIOV: Get VF list
loop For each VF
SRIOV->>Network: SetDevlinkDeviceParam(VF)
Network->>Devlink: Set parameter on VF device
end
end
end
GenericPlugin->>GenericPlugin: ConfigureBridgesGrouping()
alt GroupingPolicy = "all"
GenericPlugin->>GenericPlugin: Create OVS bridge with all interfaces
end
Note over Daemon: Discovery phase
Daemon->>SRIOV: DiscoverSriovDevices()
SRIOV->>Network: GetDevlinkDeviceParams()
Network->>Devlink: Query all devlink parameters
Devlink-->>Network: Return parameters (runtime/driverinit/permanent)
Network-->>SRIOV: Return DevlinkParams
SRIOV-->>Daemon: Update node state with discovered params
Daemon-->>User: Report status in SriovNetworkNodeState
| instance := s.getVfInfo(vf, pfNetName, iface.EswitchMode, devices) | ||
| iface.VFs = append(iface.VFs, instance) | ||
| } | ||
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | ||
| if err != nil { | ||
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | ||
| "device", device) | ||
| } else { | ||
| iface.DevlinkParams.Params = devlinkParams | ||
| } |
There was a problem hiding this comment.
device discovery silently skipped when GetDevlinkDeviceParams fails - logs error but continues and appends device without devlink params to pfList
| instance := s.getVfInfo(vf, pfNetName, iface.EswitchMode, devices) | |
| iface.VFs = append(iface.VFs, instance) | |
| } | |
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | |
| if err != nil { | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable devlink parameters, skipping", | |
| "device", device) | |
| } else { | |
| iface.DevlinkParams.Params = devlinkParams | |
| } | |
| devlinkParams, err := s.networkHelper.GetDevlinkDeviceParams(device.Address) | |
| if err != nil { | |
| log.Log.Error(err, "DiscoverSriovDevices(): unable to get devlink parameters, skipping device", | |
| "device", device.Address) | |
| continue | |
| } | |
| iface.DevlinkParams.Params = devlinkParams |
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
missing case for SF (Scalable Function) - ApplyOn field supports "SF" but only "PF" and "VF" are handled here
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | ||
| return err |
There was a problem hiding this comment.
log message missing param.Name value in structured logging field
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | |
| return err | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param", param.Name) |
| log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param") | ||
| return err |
There was a problem hiding this comment.
log message missing param.Name value in structured logging field
| log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param") | |
| return err | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param", param.Name) |
| func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) { | ||
| log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress) | ||
|
|
||
| requsted, found := false, false |
There was a problem hiding this comment.
requsted is misspelled
| requsted, found := false, false | |
| requested, found := false, false |
| uplink := &sriovnetworkv1.OVSUplinkConfigExt{ | ||
| PciAddress: interfaces[0].PciAddress, | ||
| Name: interfaces[0].Name, |
There was a problem hiding this comment.
only validates OvsGroupingPolicyAll constant but API validation enum includes perPF and perNIC - creates inconsistency where those values pass CRD validation but are rejected here
| } | ||
| brName := name + "-all" | ||
| bridgeConf := &sriovnetworkv1.OVSConfigExt{ |
There was a problem hiding this comment.
error case is not clearly handled - returns error if interfaces list is empty but silently returns nil for unsupported grouping policies (line 94)
| // Param value | ||
| Value string `json:"value,omitempty"` | ||
| // cmode option: runtime (default) | driverinit | permanent (runtime is only supported now) | ||
| // +kubebuilder:validation:Enum=runtime;driverinit;permanent | ||
| Cmode string `json:"cmode,omitempty"` | ||
| // Device to apply devlink parameter: PF (default)|VF|SF | ||
| // +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf | ||
| ApplyOn string `json:"applyOn,omitempty"` |
There was a problem hiding this comment.
design doc states "runtime is only supported now" but validation allows driverinit and permanent - should either implement support or restrict validation to runtime only
| Cmode string `json:"cmode,omitempty"` | ||
| // Device to apply devlink parameter: PF (default)|VF|SF | ||
| // +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf | ||
| ApplyOn string `json:"applyOn,omitempty"` |
There was a problem hiding this comment.
case-sensitive validation allows both uppercase and lowercase values ("PF" vs "pf") but implementation in sriov.go uses only uppercase constants from consts package
| OvsGroupingPolicyPerVF = "perVF" | ||
| OvsGroupingPolicyAll = "all" |
There was a problem hiding this comment.
OvsGroupingPolicyPerVF constant defined but not present in API validation enum - API uses perPF and perNIC instead
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR implements devlink parameters configuration for SR-IOV network devices, extending the API to support device-level configuration beyond firmware and OS settings.
Key Changes
- API Extensions: Added
DevlinkParamsandDevlinkParamtypes toSriovNetworkNodePolicyandSriovNetworkNodeStatewith support for PF, VF, and SF devices across runtime, driverinit, and permanent cmodes - Discovery: Implemented
GetDevlinkDeviceParams()to read all configured devlink parameters from devices during discovery phase - Configuration: Added
applyDevlinkPfParam()andapplyDevlinkVfParam()to apply devlink parameters to PF and VF devices respectively - Mellanox Integration: Implemented
HandleESwitchParams()to configureesw_multiportparameter with firmware flagLAG_RESOURCE_ALLOCATIONfor NVIDIA NICs - Bridge Grouping: Added
GroupingPolicyfield toBridgestruct withConfigureBridgesGrouping()supportingperPF,perNIC, andallpolicies - Test Coverage: Comprehensive unit tests for
HandleESwitchParams()with multiple scenarios
Issues Found
- Critical: Error silently ignored in
GetDevlinkDeviceParams()line 412 causing incorrect error handling - Logic: Missing SF (Subfunction) and default case handling in devlink parameter application switch statement
- Syntax: Variable typo
requstedinstead ofrequestedinisESwitchParamsRequireChange() - Test Failures: Missing
Multiportfield initialization in existing Mellanox tests will cause failures
Confidence Score: 3/5
- This PR has several bugs that must be fixed before merge, including a critical error handling bug and missing test initializations.
- Score reflects critical error handling bug in
network.gothat silently ignores errors during devlink parameter discovery, missing SF device support in the switch statement despite API allowing it, and test failures due to missingMultiportfield initialization. The typo in variable naming is minor but should be fixed. The overall feature implementation is well-designed with good test coverage for new functionality, but the bugs identified need resolution. pkg/host/internal/network/network.go(critical error handling bug),pkg/host/internal/sriov/sriov.go(missing SF support), andpkg/vendors/mellanox/mellanox_test.go(test failures)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| api/v1/sriovnetworknodepolicy_types.go | 5/5 | Added DevlinkParams API types with proper validation. Also added GroupingPolicy field to Bridge struct. |
| api/v1/sriovnetworknodestate_types.go | 5/5 | Extended Interface, InterfaceExt, VirtualFunction, and Bridges with DevlinkParams fields. |
| pkg/host/internal/network/network.go | 3/5 | Added GetDevlinkDeviceParams and refactored devlink value conversion. Critical bug: error silently ignored on line 412. |
| pkg/host/internal/sriov/sriov.go | 3/5 | Added devlink discovery and configuration with applyDevlinkPfParam/VfParam. Missing SF support and default case handling. |
| pkg/vendors/mellanox/mellanox.go | 4/5 | Implemented esw_multiport firmware configuration with HandleESwitchParams. Has typo in variable name 'requsted'. |
| pkg/vendors/mellanox/mellanox_test.go | 3/5 | Added comprehensive tests for HandleESwitchParams. Existing tests missing Multiport field initialization causing failures. |
Sequence Diagram
sequenceDiagram
participant User
participant Policy as SriovNetworkNodePolicy
participant Plugin as MellanoxPlugin
participant Vendor as Mellanox Vendor
participant Host as Host/SRIOV
participant Network as Network Helper
participant Kernel as Kernel/Devlink
User->>Policy: Create/Update Policy with DevlinkParams
Policy->>Plugin: OnNodeStateChange()
Note over Plugin,Vendor: Firmware Configuration Phase
Plugin->>Vendor: HandleESwitchParams()
Vendor->>Vendor: Check if esw_multiport change needed
alt esw_multiport requires change
Vendor->>Vendor: Set LAG_RESOURCE_ALLOCATION firmware flag
Vendor-->>Plugin: needReboot = true
else no change needed
Vendor-->>Plugin: needReboot = false
end
Note over Host,Kernel: Discovery Phase
Host->>Network: DiscoverSriovDevices()
Network->>Kernel: GetDevlinkDeviceParams(pciAddr)
Kernel-->>Network: Current devlink params
Network->>Network: Convert to DevlinkParam structs
Network-->>Host: DevlinkParams list
Host->>Host: Store in InterfaceExt
Note over Host,Kernel: Configuration Phase
Host->>Host: configSriovPFDevice(iface)
loop For each DevlinkParam
alt ApplyOn == PF
Host->>Network: SetDevlinkDeviceParam(pciAddr, name, value)
Network->>Kernel: DevlinkSetDeviceParam()
else ApplyOn == VF
Host->>Host: GetVFList(pciAddr)
loop For each VF
Host->>Network: SetDevlinkDeviceParam(vfAddr, name, value)
Network->>Kernel: DevlinkSetDeviceParam()
end
end
end
Host-->>User: Configuration Applied
| strVal := val.(string) | ||
| if err != nil { | ||
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
err is checked after the type assertion on line 429, but err is never assigned from the devlinkValueToString call on line 412. The error is silently ignored with _.
| strVal := val.(string) | |
| if err != nil { | |
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | |
| return nil, err | |
| } | |
| strVal, err := n.devlinkValueToString(p.Type, v.Data) | |
| if err != nil { | |
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | |
| return nil, err | |
| } |
| func isESwitchParamsRequireChange(iface sriovnetworkv1.Interface, ifaceStatus sriovnetworkv1.InterfaceExt) (bool, *sriovnetworkv1.DevlinkParam) { | ||
| log.Log.Info("mellanox-plugin isESwitchParamsRequireChange()", "device", iface.PciAddress) | ||
|
|
||
| requsted, found := false, false |
There was a problem hiding this comment.
Typo: requsted should be requested
| requsted, found := false, false | |
| requested, found := false, false |
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { | ||
| case consts.DevlinkParamApplyOnPf: | ||
| err = s.applyDevlinkPfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| case consts.DevlinkParamApplyOnVf: | ||
| err = s.applyDevlinkVfParam(iface, param) | ||
| if err != nil { | ||
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | ||
| return err | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Missing handling for SF (Subfunction) devices. The switch statement only handles PF and VF but the API supports SF as per the ApplyOn enum validation in sriovnetworknodepolicy_types.go
| for _, param := range iface.DevlinkParams.Params { | |
| switch param.ApplyOn { | |
| case consts.DevlinkParamApplyOnPf: | |
| err = s.applyDevlinkPfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| case consts.DevlinkParamApplyOnVf: | |
| err = s.applyDevlinkVfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| } | |
| } | |
| for _, param := range iface.DevlinkParams.Params { | |
| switch param.ApplyOn { | |
| case consts.DevlinkParamApplyOnPf: | |
| err = s.applyDevlinkPfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| case consts.DevlinkParamApplyOnVf: | |
| err = s.applyDevlinkVfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for VFs", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| case consts.DevlinkParamApplyOnSf: | |
| // TODO: Implement SF support | |
| log.Log.Info("configSriovPFDevice(): SF devlink params not yet supported, skipping", "device", iface.PciAddress, "param", param.Name) | |
| case "": | |
| // Empty ApplyOn defaults to PF | |
| err = s.applyDevlinkPfParam(iface, param) | |
| if err != nil { | |
| log.Log.Error(err, "configSriovPFDevice(): fail apply devlink param for PF", "device", iface.PciAddress, "param", param.Name) | |
| return err | |
| } | |
| default: | |
| log.Log.Error(nil, "configSriovPFDevice(): unsupported ApplyOn value", "device", iface.PciAddress, "param", param.Name, "applyOn", param.ApplyOn) | |
| return fmt.Errorf("unsupported ApplyOn value: %s", param.ApplyOn) | |
| } | |
| } |
| return nil | ||
| } | ||
| if len(interfaces) == 0 { | ||
| return fmt.Errorf("unsupported configuration, interfaces list must contain at least one element") |
There was a problem hiding this comment.
This error can be returned even though the policy was validated to be constants.OvsGroupingPolicyAll on line 91. Consider removing the length check or handling empty interfaces before the policy check.
| getBFMstconfigOutput(false, false), | ||
| "", nil) | ||
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH").Return( | ||
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return( |
There was a problem hiding this comment.
Multiport field missing in test data initialization. It should be initialized to -1 to indicate no change needed, or to 0 to match the expected command output.
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=ETH", "LAG_RESOURCE_ALLOCATION=0").Return( | |
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "ETH", Multiport: 0}}) |
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test").Return( | ||
| u.EXPECT().RunCommand("mstconfig", "-d", "0000:d8:00.0", "-y", "set", "SRIOV_EN=True", "NUM_OF_VFS=10", "LINK_TYPE_P1=ETH", "LINK_TYPE_P2=test", "LAG_RESOURCE_ALLOCATION=0").Return( | ||
| "", | ||
| "", testError) |
There was a problem hiding this comment.
Multiport field missing in test data initialization.
| "", testError) | |
| err := m.MlxConfigFW(map[string]MlxNic{"0000:d8:00.0": {EnableSriov: true, TotalVfs: 10, LinkTypeP1: "ETH", LinkTypeP2: "test", Multiport: 0}}) |
01f2b6c to
b33f550
Compare
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com> fd
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
Signed-off-by: Ivan Kolodiazhnyi <ikolodiazhny@nvidia.com>
e7ff81a to
5ba2593
Compare
There was a problem hiding this comment.
Greptile Overview
Greptile Summary
This PR implements devlink parameters configuration support for SR-IOV Network Operator, allowing users to configure device-level parameters through the SriovNetworkNodePolicy API.
Key Changes:
- Adds
DevlinkParamsstruct to policy spec with fields for parameter name, value, cmode (runtime/driverinit/permanent), and applyOn (PF/VF/SF) - Implements discovery of current devlink parameters during device scanning via
GetDevlinkDeviceParams() - Applies devlink parameters through
SetDevlinkDeviceParam()for both PF and VF devices - Mellanox vendor plugin handles
esw_multiportparameter by setting firmwareLAG_RESOURCE_ALLOCATIONflag - Adds bridge grouping policy support with "all" option to group multiple interfaces under single OVS bridge
- Comprehensive test coverage including unit tests for eswitch params and devlink discovery
Critical Issues Found:
- Case-insensitive comparison missing for
ApplyOnfield - lowercase values (pf,vf,sf) will be silently ignored - Unreachable error check in
GetDevlinkDeviceParams()due to discarded error from type conversion - DevlinkParams not merged in
mergeConfigs()- when multiple policies apply to same interface, devlink params from lower-priority policies are lost - Missing parameter names in error log messages make debugging difficult
Minor Issues:
- Empty string defaults not documented for
CmodeandApplyOnfields - Unclear default behavior when
esw_multiportparameter is requested but not currently configured
Confidence Score: 3/5
- This PR has several logic bugs that could cause silent failures and data loss, but no critical security vulnerabilities
- Score reflects multiple logic bugs: case-insensitive comparison missing (causing silent failures for lowercase values), unreachable error check, and policy merge issues that lose devlink params. These are functional bugs that will impact users but are fixable. The core implementation is sound with good test coverage.
- Pay close attention to
pkg/host/internal/sriov/sriov.go(case handling),pkg/host/internal/network/network.go(error handling), andapi/v1/helper.go(merge logic)
Important Files Changed
File Analysis
| Filename | Score | Overview |
|---|---|---|
| pkg/host/internal/sriov/sriov.go | 3/5 | Adds devlink param application logic for PF/VF. Issues: case-insensitive comparison missing for ApplyOn, missing param names in error logs. |
| pkg/host/internal/network/network.go | 3/5 | Implements GetDevlinkDeviceParams() for discovery and refactors value conversion. Issue: unreachable error check on line 429-431. |
| pkg/vendors/mellanox/mellanox.go | 3/5 | Adds esw_multiport handling via LAG_RESOURCE_ALLOCATION firmware flag. Minor: unclear default behavior when param not found. |
| api/v1/sriovnetworknodepolicy_types.go | 4/5 | Adds DevlinkParam and DevlinkParams types with validation. Minor: empty string defaults not documented for Cmode/ApplyOn. |
| api/v1/helper.go | 3/5 | Copies DevlinkParams in Apply(). Issue: DevlinkParams not merged in mergeConfigs(), losing params from lower-priority policies. |
Sequence Diagram
sequenceDiagram
participant User
participant API as SriovNetworkNodePolicy
participant Controller as Controller/Daemon
participant GenericPlugin
participant MellanoxPlugin
participant SRIOV as SRIOV Module
participant Network as Network Helper
participant Firmware as mstconfig/mstfwreset
User->>API: Create/Update Policy with DevlinkParams
API->>Controller: Apply policy to NodeState
Note over API,Controller: DevlinkParams copied to Interface spec
Controller->>MellanoxPlugin: OnNodeStateChange()
MellanoxPlugin->>MellanoxPlugin: Check for esw_multiport param
alt esw_multiport requested
MellanoxPlugin->>MellanoxPlugin: Set LAG_RESOURCE_ALLOCATION flag
MellanoxPlugin->>Firmware: mstconfig set LAG_RESOURCE_ALLOCATION
Note over MellanoxPlugin,Firmware: Requires reboot
end
MellanoxPlugin-->>Controller: needReboot=true
Controller->>GenericPlugin: OnNodeStateChange()
GenericPlugin-->>Controller: needDrain/needReboot
Controller->>Controller: Drain node if needed
Controller->>GenericPlugin: Apply()
GenericPlugin->>SRIOV: ConfigSriovInterfaces()
SRIOV->>SRIOV: configSriovPFDevice()
loop For each DevlinkParam
alt ApplyOn == PF
SRIOV->>Network: SetDevlinkDeviceParam(PF)
Network->>Network: devlink set parameter
else ApplyOn == VF
SRIOV->>SRIOV: GetVFList()
loop For each VF
SRIOV->>Network: SetDevlinkDeviceParam(VF)
Network->>Network: devlink set parameter
end
end
end
SRIOV->>SRIOV: configSriovVFDevices()
SRIOV-->>GenericPlugin: Configuration complete
Controller->>SRIOV: DiscoverSriovDevices()
SRIOV->>Network: GetDevlinkDeviceParams(PF)
Network->>Network: Query all devlink params
Network-->>SRIOV: DevlinkParam list
SRIOV-->>Controller: Updated status with devlink params
Controller->>API: Update NodeState status
Note over User,API: Status reflects configured devlink params
| } | ||
|
|
||
| for _, param := range iface.DevlinkParams.Params { | ||
| switch param.ApplyOn { |
There was a problem hiding this comment.
case-insensitive comparison needed for ApplyOn field
The API allows both uppercase and lowercase (PF;pf;VF;vf;SF;sf), but this switch only matches uppercase constants. Lowercase values will be silently ignored.
| switch param.ApplyOn { | |
| applyOn := strings.ToUpper(param.ApplyOn) | |
| switch applyOn { |
| func (s *sriov) applyDevlinkVfParam(iface *sriovnetworkv1.Interface, param sriovnetworkv1.DevlinkParam) error { | ||
| vfAddrs, err := s.dputilsLib.GetVFList(iface.PciAddress) | ||
| if err != nil { | ||
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") |
There was a problem hiding this comment.
missing parameter name in log message
The "param" field is present in the error log but has no value, making debugging difficult.
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param") | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail to get VF list", "device", iface.PciAddress, "param", param.Name) |
| for _, addr := range vfAddrs { | ||
| err = s.networkHelper.SetDevlinkDeviceParam(addr, param.Name, param.Value) | ||
| if err != nil { | ||
| log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param") |
There was a problem hiding this comment.
missing parameter name in log message
Same issue as line 473 - the "param" field needs a value.
| log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param") | |
| log.Log.Error(err, "applyDevlinkVfParam(): fail apply devlink param for VF", "device", iface.PciAddress, "param", param.Name) |
| strVal := val.(string) | ||
| if err != nil { | ||
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) |
There was a problem hiding this comment.
unreachable error check after type assertion
The variable err is always nil here (line 412: strVal, _ := n.devlinkValueToString(...)). This check will never trigger.
| strVal := val.(string) | |
| if err != nil { | |
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | |
| strVal, err := n.devlinkValueToString(p.Type, v.Data) | |
| if err != nil { | |
| log.Log.Error(err, "Failed to decode devlink param value", "device", pciAddr) | |
| return nil, err | |
| } |
| } else { | ||
| attr.Multiport = 1 | ||
| } |
There was a problem hiding this comment.
unclear logic when devlink param is not found
When esw_multiport is requested but not currently configured, the code sets attr.Multiport = 1. This assumes enabling multiport is the default, but it's unclear if this is intentional. Should consider explicitly documenting or validating this behavior.
Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
| EswitchMode: p.Spec.EswitchMode, | ||
| NumVfs: p.Spec.NumVfs, | ||
| ExternallyManaged: p.Spec.ExternallyManaged, | ||
| DevlinkParams: p.Spec.DevlinkParams, |
There was a problem hiding this comment.
DevlinkParams not merged in mergeConfigs
When merging interface configs from multiple policies, DevlinkParams field is copied wholesale from the input without merging. If two policies apply to the same interface with different devlink params, one set will be completely lost.
| // +kubebuilder:validation:Enum=runtime;driverinit;permanent | ||
| Cmode string `json:"cmode,omitempty"` | ||
| // Device to apply devlink parameter: PF (default)|VF|SF | ||
| // +kubebuilder:validation:Enum=PF;pf;VF;vf;SF;sf | ||
| ApplyOn string `json:"applyOn,omitempty"` |
There was a problem hiding this comment.
empty string default not documented for Cmode and ApplyOn
The API validation allows these fields to be omitted (due to omitempty), but there's no documentation or defaulting logic for empty values. Code in sriov.go will fail to match empty strings against constants.
No description provided.