Add fallback for VF VLAN configuration when protocol is unsupported#393
Add fallback for VF VLAN configuration when protocol is unsupported#393ashokpariya0 wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
Conversation
WalkthroughRetry VF VLAN programming without specifying a VLAN protocol when netlink returns "protocol not supported"; add NetlinkManager method and mocks to support the retry path. No exported signatures changed aside from the new netlink method. Changes
Sequence Diagram(s)mermaid Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing touches
📜 Recent review detailsConfiguration used: Organization UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (4)
🚧 Files skipped from review as they are similar to previous changes (2)
🧰 Additional context used🧬 Code graph analysis (2)pkg/sriov/sriov.go (2)
pkg/utils/mocks/netlink_manager_mock.go (1)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
Hi @ashokpariya0, I don't think this one is right as in case the VF as a vlan configure by something (or even a left over) and the user doesn't want a vlan we will not reset the device. maybe we can check if the vlan is 0 and the user want 0 then we don't do anything. but if there is a vlan and the user doesn't want vlan (vlan 0 no nil) we should apply the 0 |
|
also please check the unit-test that are failing |
98c35cd to
f60702c
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
pkg/sriov/sriov.gopkg/utils/netlink_manager.gopkg/utils/testing.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/sriov/sriov.go (2)
pkg/types/types.go (1)
VlanProtoInt(16-16)pkg/logging/logging.go (2)
Error(71-73)Debug(56-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Analyze (go)
🔇 Additional comments (2)
pkg/sriov/sriov.go (2)
5-5: LGTM - Import added for error string matching.The
stringsimport is appropriately added to support the retry logic that checks for "protocol not supported" errors.
338-347: Remove the critical issue flag - the PR objective claim is incorrect.The commit message clearly states the feature is "Add fallback for VF VLAN configuration when protocol is unsupported", not "avoid vlan configuration when vlan is disabled (vlan=0)". The implementation correctly reflects this: it retries LinkSetVfVlanQosProto without protocol specification when the driver returns a "protocol not supported" error. The nil check
if conf.Vlan != nilis appropriate for this logic.Minor concerns remain:
Fragile error detection (line 340): Checking for "protocol not supported" using
strings.Containsmay miss variations in error messages across different drivers/kernels (e.g., "protocol is not supported", "unsupported protocol"). Consider whether netlink provides more specific error types.Indentation (line 343): Uses spaces instead of tabs.
Likely an incorrect or invalid review comment.
| if conf.Vlan != nil { | ||
| if err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto); err != nil { | ||
| err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto) | ||
| if err != nil && strings.Contains(err.Error(), "protocol not supported") { | ||
| err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS) | ||
| } | ||
| if err != nil { | ||
| return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %d: %v", conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Same issues as ApplyVFConfig - inconsistent with PR objectives.
This section has the same fundamental problems as ApplyVFConfig:
-
Still applies VLAN config when vlan=0: The condition
if conf.Vlan != nil(line 466) doesn't differentiate between vlan=0 and vlan>0. -
Implements protocol fallback, not vlan=0 skipping: The retry logic (lines 468-470) handles "protocol not supported" errors but doesn't implement the PR's stated objective.
-
Indentation issues: Lines 468-470 use spaces instead of tabs.
This reinforces that the PR description doesn't match the implementation. The code adds a useful fallback for hardware that doesn't support VLAN protocol specification, but it doesn't "avoid vlan configuration when vlan is disabled."
🔎 Fix indentation
if conf.Vlan != nil {
err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto)
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS)
}
if err != nil {
return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %d: %v", conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto, err)
}
}Note: This only fixes formatting. The logical issues remain.
Committable suggestion skipped: line range outside the PR's diff.
🤖 Prompt for AI Agents
In pkg/sriov/sriov.go around lines 466 to 474 the code checks only if conf.Vlan
!= nil which still applies VLAN configuration when vlan==0; change the logic to
first detect whether conf.Vlan is nil or present and whether *conf.Vlan == 0 and
skip any VLAN configuration when VLAN is explicitly disabled (vlan==0). Keep the
existing protocol fallback: attempt LinkSetVfVlanQosProto and on an error
containing "protocol not supported" retry with LinkSetVfVlanQos, but only
perform these calls when VLAN is enabled (>0). Also fix indentation on the retry
block to use tabs to match file formatting and keep the existing error return
message unchanged but ensure it only runs when VLAN was actually being set.
| func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error { | ||
| p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos) | ||
| return nil | ||
| } |
There was a problem hiding this comment.
Fix indentation to use tabs instead of spaces.
Line 216 uses spaces for indentation. Consistent with Go standards, use tabs for indentation.
🔎 Proposed fix
func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error {
- p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
- return nil
+ p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
+ return nil
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error { | |
| p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos) | |
| return nil | |
| } | |
| func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error { | |
| p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos) | |
| return nil | |
| } |
🤖 Prompt for AI Agents
In pkg/utils/testing.go around lines 215 to 218, the indentation on line 216
uses spaces instead of tabs; update the indentation to use tabs (consistent with
Go convention) so the function body line(s) are indented with a single tab
character rather than spaces, keeping the rest of the code unchanged.
f60702c to
9d59851
Compare
Pull Request Test Coverage Report for Build 20653657157Details
💛 - Coveralls |
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (3)
pkg/utils/testing.go (1)
215-218: Fix indentation to use tabs instead of spaces.Lines 216-217 use spaces for indentation. Go code should use tabs for indentation per
gofmtstandards.🔎 Proposed fix
func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error { - p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos) - return nil + p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos) + return nil }pkg/utils/netlink_manager.go (1)
51-54: Fix indentation to use tabs instead of spaces.Line 53 uses spaces for indentation, violating Go's standard formatting. Use tabs for indentation.
🔎 Proposed fix
// LinkSetVfVlanQos sets VLAN ID and QoS field for given VF using NetlinkManager func (n *MyNetlink) LinkSetVfVlanQos(link netlink.Link, vf, vlan, qos int) error { - return netlink.LinkSetVfVlanQos(link, vf, vlan, qos) + return netlink.LinkSetVfVlanQos(link, vf, vlan, qos) }pkg/sriov/sriov.go (1)
466-474: Same issue as ApplyVFConfig - vlan=0 configuration not skipped.The reset logic has the same problem:
if conf.Vlan != nil(line 466) doesn't differentiate betweenvlan=0andvlan>0. When resetting, if the original configuration hadvlan=0, VLAN programming should be skipped to avoid failures on hardware that doesn't support it.Apply the same fix as in
ApplyVFConfig: only attempt VLAN configuration when the VLAN value is greater than 0.🔎 Proposed fix
if conf.Vlan != nil { + // Skip VLAN reset when restoring vlan=0 (VLAN disabled) + if conf.OrigVfState.Vlan > 0 { err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto) if err != nil && strings.Contains(err.Error(), "protocol not supported") { err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS) } if err != nil { return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %d: %v", conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto, err) } + } }
📜 Review details
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
pkg/sriov/sriov.gopkg/utils/mocks/netlink_manager_mock.gopkg/utils/netlink_manager.gopkg/utils/testing.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/utils/mocks/netlink_manager_mock.go (1)
pkg/utils/netlink_manager.go (1)
NetlinkManager(12-28)
pkg/sriov/sriov.go (2)
pkg/types/types.go (1)
VlanProtoInt(16-16)pkg/logging/logging.go (1)
Debug(56-58)
🪛 GitHub Actions: Go-static-analysis
pkg/sriov/sriov.go
[error] 343-343: File is not properly formatted (gofmt).
🪛 GitHub Check: Lint
pkg/sriov/sriov.go
[failure] 343-343:
File is not properly formatted (gofmt)
pkg/utils/netlink_manager.go
[failure] 53-53:
File is not properly formatted (gofmt)
pkg/utils/testing.go
[failure] 216-216:
File is not properly formatted (gofmt)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: coverage
- GitHub Check: SR-IOV operator e2e tests
- GitHub Check: Analyze (go)
| if conf.Vlan != nil { | ||
| if err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS, sriovtypes.VlanProtoInt[*conf.VlanProto]); err != nil { | ||
| err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS, sriovtypes.VlanProtoInt[*conf.VlanProto]) | ||
| if err != nil && strings.Contains(err.Error(), "protocol not supported") { | ||
| logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err) | ||
| err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS) | ||
| } | ||
| if err != nil { | ||
| return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %s: %v", conf.VFID, *conf.Vlan, *conf.VlanQoS, *conf.VlanProto, err) | ||
| } | ||
| } |
There was a problem hiding this comment.
Critical: PR does not implement stated objective - vlan=0 still attempts configuration.
The condition if conf.Vlan != nil (line 338) is true when conf.Vlan points to 0, so VLAN configuration is still attempted when vlan=0. According to the PR description and issue #392, the SR-IOV CNI specification defines vlan=0 as "VLAN disabled" and configuration should be skipped entirely to avoid failures on hardware that doesn't support it.
The current implementation adds a useful protocol fallback (retry without proto when "protocol not supported") but doesn't address the core issue: avoiding VLAN configuration when VLAN is disabled.
Expected behavior per PR objectives and reviewer feedback:
- When
*conf.Vlan == 0, skip VLAN configuration entirely (no netlink calls) - Only apply VLAN protocol and QoS when
*conf.Vlan > 0
🔎 Proposed fix to skip vlan=0 configuration
// 1. Set vlan
if conf.Vlan != nil {
+ // Skip VLAN configuration when vlan=0 (VLAN disabled per spec)
+ if *conf.Vlan > 0 {
err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS, sriovtypes.VlanProtoInt[*conf.VlanProto])
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
- }
+ }
if err != nil {
return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %s: %v", conf.VFID, *conf.Vlan, *conf.VlanQoS, *conf.VlanProto, err)
}
+ }
}Note: This also fixes the indentation on line 343.
🧰 Tools
🪛 GitHub Actions: Go-static-analysis
[error] 343-343: File is not properly formatted (gofmt).
🪛 GitHub Check: Lint
[failure] 343-343:
File is not properly formatted (gofmt)
🤖 Prompt for AI Agents
In pkg/sriov/sriov.go around lines 338 to 347, the code currently tests only for
conf.Vlan != nil which still attempts VLAN configuration when conf.Vlan points
to 0; change the guard to require a non-nil pointer and a positive VLAN id
(conf.Vlan != nil && *conf.Vlan > 0) so all VLAN netlink calls are skipped when
vlan=0; keep the existing proto fallback logic and error handling intact, and
correct the indentation on the nested fallback block.
pkg/sriov/sriov.go
Outdated
| if err != nil && strings.Contains(err.Error(), "protocol not supported") { | ||
| logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err) | ||
| err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS) | ||
| } |
There was a problem hiding this comment.
Fix indentation and consider more robust error detection.
Line 343 has mixed indentation (spaces instead of tabs), causing the gofmt failure.
Additionally, inspecting error strings with strings.Contains(err.Error(), "protocol not supported") is fragile - different drivers or kernel versions may return different error messages. Consider checking for specific netlink error codes if available.
🔎 Fix indentation
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
- }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil && strings.Contains(err.Error(), "protocol not supported") { | |
| logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err) | |
| err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS) | |
| } | |
| if err != nil && strings.Contains(err.Error(), "protocol not supported") { | |
| logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err) | |
| err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS) | |
| } |
🧰 Tools
🪛 GitHub Actions: Go-static-analysis
[error] 343-343: File is not properly formatted (gofmt).
🪛 GitHub Check: Lint
[failure] 343-343:
File is not properly formatted (gofmt)
| // LinkSetVfVlanQos provides a mock function with given fields: _a0, _a1, _a2, _a3 | ||
| func (_m *NetlinkManager) LinkSetVfVlanQos( _a0 netlink.Link, _a1 int, _a2 int, _a3 int ) error { | ||
| ret := _m.Called(_a0, _a1, _a2, _a3) | ||
|
|
||
| if len(ret) == 0 { | ||
| panic("no return value specified for LinkSetVfVlanQos") | ||
| } | ||
|
|
||
| var r0 error | ||
| if rf, ok := ret.Get(0).(func(netlink.Link, int, int, int) error); ok { | ||
| r0 = rf(_a0, _a1, _a2, _a3) | ||
| } else { | ||
| r0 = ret.Error(0) | ||
| } | ||
|
|
||
| return r0 | ||
| } |
There was a problem hiding this comment.
Re-generate mock with proper formatting.
This auto-generated mock has inconsistent indentation (spaces instead of tabs on multiple lines). Since the file header warns "DO NOT EDIT", regenerate the mock using mockery with proper Go formatting instead of manually editing.
Run mockery to regenerate this file with correct formatting:
#!/bin/bash
# Regenerate the NetlinkManager mock with proper formatting
cd pkg/utils
mockery --name=NetlinkManager --output=mocks --outpkg=mocks🤖 Prompt for AI Agents
In pkg/utils/mocks/netlink_manager_mock.go around lines 282 to 298 the
auto-generated mock contains inconsistent indentation (spaces instead of tabs);
do not manually edit—re-generate the mock using mockery in the pkg/utils
directory to produce properly formatted Go code and then run gofmt (or
goimports) over the generated file; e.g., cd into pkg/utils and run mockery
--name=NetlinkManager --output=mocks --outpkg=mocks, then run gofmt -w
./pkg/utils/mocks (or goimports -w) to enforce correct tabs/formatting before
committing.
When configuring VLAN on a VF (both during pod setup and reset):
- First attempt LinkSetVfVlanQosProto (with explicit protocol).
- If the call fails with an error containing "protocol not supported",
retry using LinkSetVfVlanQos (without protocol specification).
Signed-off-by: Ashok Pariya <ashok.pariya@ibm.com>
9d59851 to
2ad533e
Compare
Yes, so we see below error irrespective of vlan value is 0 or > 0 (more details are in #392) it seems protocol( 802.1q:) is not supported, so we can use method(https://github.com/vishvananda/netlink/blob/6e61cd407d1d66dd67fc66aaaf5c2a42e0614426/link_linux.go#L652) when method(https://github.com/vishvananda/netlink/blob/6e61cd407d1d66dd67fc66aaaf5c2a42e0614426/link_linux.go#L683) fail, I updated the PR desctiptions with new change, please check |
|
Hi @ashokpariya0 do we still need this PR after your PR in the netlink lib was merged? |
yeah sure, will do that. |
Requested a release for vishvananda/netlink a few weeks ago (issue: vishvananda/netlink#1166 Should we wait longer for the release, or proceed with merging this PR? |
you can bump the Line 8 in 3941954 |
|
Thanks, |
Why is this change needed?
SRIOV-CNI currently configures VF VLANs using LinkSetVfVlanQosProto, which includes the VLAN protocol attribute (IFLA_VF_VLAN_PROTOCOL). However, not all kernel / driver combinations support this attribute.
On such systems, the netlink call fails with:
This results in pod sandbox creation failure, even though the VF VLAN could be successfully configured without specifying the protocol
What does this PR change?
Attempt to configure VF VLAN using LinkSetVfVlanQosProto (current behavior)
If the operation fails with EOPNOTSUPP / “protocol not supported”:
Log a debug message
Retry VLAN configuration using the legacy LinkSetVfVlanQos API (without protocol)
Apply the same fallback logic during VF reset to ensure consistent behavior
Add netlink abstraction support and test mocks for the legacy VLAN API
Related Issues
Fixes: #392
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.