Skip to content

Add fallback for VF VLAN configuration when protocol is unsupported#393

Open
ashokpariya0 wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
ashokpariya0:fix-vlan-config-cond
Open

Add fallback for VF VLAN configuration when protocol is unsupported#393
ashokpariya0 wants to merge 1 commit intok8snetworkplumbingwg:masterfrom
ashokpariya0:fix-vlan-config-cond

Conversation

@ashokpariya0
Copy link
Copy Markdown
Contributor

@ashokpariya0 ashokpariya0 commented Dec 17, 2025

Why is this change needed?

SRIOV-CNI currently configures VF VLANs using LinkSetVfVlanQosProto, which includes the VLAN protocol attribute (IFLA_VF_VLAN_PROTOCOL). However, not all kernel / driver combinations support this attribute.

On such systems, the netlink call fails with:

protocol not supported

This results in pod sandbox creation failure, even though the VF VLAN could be successfully configured without specifying the protocol

What does this PR change?

Attempt to configure VF VLAN using LinkSetVfVlanQosProto (current behavior)

If the operation fails with EOPNOTSUPP / “protocol not supported”:

Log a debug message

Retry VLAN configuration using the legacy LinkSetVfVlanQos API (without protocol)

Apply the same fallback logic during VF reset to ensure consistent behavior

Add netlink abstraction support and test mocks for the legacy VLAN API

Related Issues
Fixes: #392

Summary by CodeRabbit

  • Bug Fixes

    • Retry VLAN apply/restore without a VLAN protocol when a device reports the protocol as unsupported, reducing failed VLAN operations and preserving original error reporting.
    • Improved validation to avoid applying or restoring invalid or unset VLAN values.
  • Chores

    • Added compatibility support and test/mocking hooks for VLAN QoS operations to better handle devices that don’t accept protocol parameters.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Dec 17, 2025

Walkthrough

Retry VF VLAN programming without specifying a VLAN protocol when netlink returns "protocol not supported"; add NetlinkManager method and mocks to support the retry path. No exported signatures changed aside from the new netlink method.

Changes

Cohort / File(s) Summary
SR-IOV VLAN handling
pkg/sriov/sriov.go
ApplyVFConfig and ResetVFConfig: call LinkSetVfVlanQosProto when VLAN set; if error contains "protocol not supported", retry LinkSetVfVlanQos (no proto). Error reporting preserved.
Netlink manager API & impl
pkg/utils/netlink_manager.go
Added LinkSetVfVlanQos(netlink.Link, int, int, int) error to NetlinkManager and implemented on MyNetlink delegating to netlink.LinkSetVfVlanQos.
Netlink testing & mocks
pkg/utils/testing.go, pkg/utils/mocks/netlink_manager_mock.go
Added pfMockNetlinkLib.LinkSetVfVlanQos(...) (records call) and mock NetlinkManager.LinkSetVfVlanQos(...) to support the retry path in tests.

Sequence Diagram(s)

mermaid
sequenceDiagram
participant SR as SRIOV-CNI (pkg/sriov)
participant NL as NetlinkManager
participant KD as Kernel/Driver
SR->>NL: LinkSetVfVlanQosProto(link, vf, vlan, qos, proto)
alt success
NL->>KD: set VF VLAN with proto
KD-->>NL: OK
NL-->>SR: nil
else error contains "protocol not supported"
NL-->>SR: error("protocol not supported")
SR->>NL: LinkSetVfVlanQos(link, vf, vlan, qos) note right of SR #DDF2E9: retry without proto
NL->>KD: set VF VLAN without proto
alt success
KD-->>NL: OK
NL-->>SR: nil
else fail
KD-->>NL: error
NL-->>SR: error
end
else other error
NL-->>SR: error
end

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Linked Issues check ❓ Inconclusive The PR implements the core fallback mechanism (attempt LinkSetVfVlanQosProto, fallback to LinkSetVfVlanQos) that addresses the linked issue #392, but comments indicate unit tests are failing and coverage is low, potentially leaving the implementation partially unvalidated. Address the failing unit tests and improve test coverage to ensure the fallback logic is properly validated across both success and failure scenarios as noted in the PR feedback.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding a fallback mechanism for VF VLAN configuration when protocol is unsupported, which directly addresses the issue and represents the primary objective of the PR.
Out of Scope Changes check ✅ Passed All changes are directly scoped to implementing the fallback mechanism for VF VLAN configuration: adding the strings import, implementing retry logic in ApplyVFConfig and ResetVFConfig, and adding supporting methods to netlink abstractions, mocks, and testing utilities.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9d59851 and 2ad533e.

📒 Files selected for processing (4)
  • pkg/sriov/sriov.go
  • pkg/utils/mocks/netlink_manager_mock.go
  • pkg/utils/netlink_manager.go
  • pkg/utils/testing.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • pkg/utils/netlink_manager.go
  • pkg/utils/testing.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/sriov/sriov.go (2)
pkg/types/types.go (1)
  • VlanProtoInt (16-16)
pkg/logging/logging.go (2)
  • Error (71-73)
  • Debug (56-58)
pkg/utils/mocks/netlink_manager_mock.go (1)
pkg/utils/netlink_manager.go (1)
  • NetlinkManager (12-28)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: SR-IOV operator e2e tests
  • GitHub Check: coverage
  • GitHub Check: Analyze (go)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@SchSeba
Copy link
Copy Markdown
Collaborator

SchSeba commented Dec 24, 2025

Hi @ashokpariya0,

I don't think this one is right as in case the VF as a vlan configure by something (or even a left over) and the user doesn't want a vlan we will not reset the device.

maybe we can check if the vlan is 0 and the user want 0 then we don't do anything. but if there is a vlan and the user doesn't want vlan (vlan 0 no nil) we should apply the 0

@SchSeba
Copy link
Copy Markdown
Collaborator

SchSeba commented Dec 24, 2025

also please check the unit-test that are failing

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 98c35cd and f60702c.

📒 Files selected for processing (3)
  • pkg/sriov/sriov.go
  • pkg/utils/netlink_manager.go
  • pkg/utils/testing.go
🧰 Additional context used
🧬 Code graph analysis (1)
pkg/sriov/sriov.go (2)
pkg/types/types.go (1)
  • VlanProtoInt (16-16)
pkg/logging/logging.go (2)
  • Error (71-73)
  • Debug (56-58)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Analyze (go)
🔇 Additional comments (2)
pkg/sriov/sriov.go (2)

5-5: LGTM - Import added for error string matching.

The strings import is appropriately added to support the retry logic that checks for "protocol not supported" errors.


338-347: Remove the critical issue flag - the PR objective claim is incorrect.

The commit message clearly states the feature is "Add fallback for VF VLAN configuration when protocol is unsupported", not "avoid vlan configuration when vlan is disabled (vlan=0)". The implementation correctly reflects this: it retries LinkSetVfVlanQosProto without protocol specification when the driver returns a "protocol not supported" error. The nil check if conf.Vlan != nil is appropriate for this logic.

Minor concerns remain:

  1. Fragile error detection (line 340): Checking for "protocol not supported" using strings.Contains may miss variations in error messages across different drivers/kernels (e.g., "protocol is not supported", "unsupported protocol"). Consider whether netlink provides more specific error types.

  2. Indentation (line 343): Uses spaces instead of tabs.

Likely an incorrect or invalid review comment.

Comment on lines 466 to 474
if conf.Vlan != nil {
if err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto); err != nil {
err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto)
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS)
}
if err != nil {
return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %d: %v", conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto, err)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Same issues as ApplyVFConfig - inconsistent with PR objectives.

This section has the same fundamental problems as ApplyVFConfig:

  1. Still applies VLAN config when vlan=0: The condition if conf.Vlan != nil (line 466) doesn't differentiate between vlan=0 and vlan>0.

  2. Implements protocol fallback, not vlan=0 skipping: The retry logic (lines 468-470) handles "protocol not supported" errors but doesn't implement the PR's stated objective.

  3. Indentation issues: Lines 468-470 use spaces instead of tabs.

This reinforces that the PR description doesn't match the implementation. The code adds a useful fallback for hardware that doesn't support VLAN protocol specification, but it doesn't "avoid vlan configuration when vlan is disabled."

🔎 Fix indentation
 	if conf.Vlan != nil {
 		err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto)
 		if err != nil && strings.Contains(err.Error(), "protocol not supported") {
 			err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS)
 		}
 		if err != nil {
 			return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %d: %v", conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto, err)
 		}
 	}

Note: This only fixes formatting. The logical issues remain.

Committable suggestion skipped: line range outside the PR's diff.

🤖 Prompt for AI Agents
In pkg/sriov/sriov.go around lines 466 to 474 the code checks only if conf.Vlan
!= nil which still applies VLAN configuration when vlan==0; change the logic to
first detect whether conf.Vlan is nil or present and whether *conf.Vlan == 0 and
skip any VLAN configuration when VLAN is explicitly disabled (vlan==0). Keep the
existing protocol fallback: attempt LinkSetVfVlanQosProto and on an error
containing "protocol not supported" retry with LinkSetVfVlanQos, but only
perform these calls when VLAN is enabled (>0). Also fix indentation on the retry
block to use tabs to match file formatting and keep the existing error return
message unchanged but ensure it only runs when VLAN was actually being set.

Comment on lines +215 to +218
func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error {
p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
return nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix indentation to use tabs instead of spaces.

Line 216 uses spaces for indentation. Consistent with Go standards, use tabs for indentation.

🔎 Proposed fix
 func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error {
-        p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
-        return nil
+	p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
+	return nil
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error {
p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
return nil
}
func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error {
p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
return nil
}
🤖 Prompt for AI Agents
In pkg/utils/testing.go around lines 215 to 218, the indentation on line 216
uses spaces instead of tabs; update the indentation to use tabs (consistent with
Go convention) so the function body line(s) are indented with a single tab
character rather than spaces, keeping the rest of the code unchanged.

@coveralls
Copy link
Copy Markdown

coveralls commented Jan 2, 2026

Pull Request Test Coverage Report for Build 20653657157

Details

  • 6 of 18 (33.33%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.4%) to 61.07%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/utils/netlink_manager.go 0 3 0.0%
pkg/utils/testing.go 0 4 0.0%
pkg/sriov/sriov.go 6 11 54.55%
Totals Coverage Status
Change from base Build 20619423447: -0.4%
Covered Lines: 982
Relevant Lines: 1608

💛 - Coveralls

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

♻️ Duplicate comments (3)
pkg/utils/testing.go (1)

215-218: Fix indentation to use tabs instead of spaces.

Lines 216-217 use spaces for indentation. Go code should use tabs for indentation per gofmt standards.

🔎 Proposed fix
 func (p *pfMockNetlinkLib) LinkSetVfVlanQos(link netlink.Link, vfIndex, vlan, vlanQos int) error {
-        p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
-        return nil
+	p.recordMethodCall("LinkSetVfVlanQos %s %d %d %d", link.Attrs().Name, vfIndex, vlan, vlanQos)
+	return nil
 }
pkg/utils/netlink_manager.go (1)

51-54: Fix indentation to use tabs instead of spaces.

Line 53 uses spaces for indentation, violating Go's standard formatting. Use tabs for indentation.

🔎 Proposed fix
 // LinkSetVfVlanQos sets VLAN ID and QoS field for given VF using NetlinkManager
 func (n *MyNetlink) LinkSetVfVlanQos(link netlink.Link, vf, vlan, qos int) error {
-        return netlink.LinkSetVfVlanQos(link, vf, vlan, qos)
+	return netlink.LinkSetVfVlanQos(link, vf, vlan, qos)
 }
pkg/sriov/sriov.go (1)

466-474: Same issue as ApplyVFConfig - vlan=0 configuration not skipped.

The reset logic has the same problem: if conf.Vlan != nil (line 466) doesn't differentiate between vlan=0 and vlan>0. When resetting, if the original configuration had vlan=0, VLAN programming should be skipped to avoid failures on hardware that doesn't support it.

Apply the same fix as in ApplyVFConfig: only attempt VLAN configuration when the VLAN value is greater than 0.

🔎 Proposed fix
 	if conf.Vlan != nil {
+		// Skip VLAN reset when restoring vlan=0 (VLAN disabled)
+		if conf.OrigVfState.Vlan > 0 {
 		err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto)
 		if err != nil && strings.Contains(err.Error(), "protocol not supported") {
 			err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS)
 		}
 		if err != nil {
 			return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %d: %v", conf.VFID, conf.OrigVfState.Vlan, conf.OrigVfState.VlanQoS, conf.OrigVfState.VlanProto, err)
 		}
+		}
 	}
📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f60702c and 9d59851.

📒 Files selected for processing (4)
  • pkg/sriov/sriov.go
  • pkg/utils/mocks/netlink_manager_mock.go
  • pkg/utils/netlink_manager.go
  • pkg/utils/testing.go
🧰 Additional context used
🧬 Code graph analysis (2)
pkg/utils/mocks/netlink_manager_mock.go (1)
pkg/utils/netlink_manager.go (1)
  • NetlinkManager (12-28)
pkg/sriov/sriov.go (2)
pkg/types/types.go (1)
  • VlanProtoInt (16-16)
pkg/logging/logging.go (1)
  • Debug (56-58)
🪛 GitHub Actions: Go-static-analysis
pkg/sriov/sriov.go

[error] 343-343: File is not properly formatted (gofmt).

🪛 GitHub Check: Lint
pkg/sriov/sriov.go

[failure] 343-343:
File is not properly formatted (gofmt)

pkg/utils/netlink_manager.go

[failure] 53-53:
File is not properly formatted (gofmt)

pkg/utils/testing.go

[failure] 216-216:
File is not properly formatted (gofmt)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: coverage
  • GitHub Check: SR-IOV operator e2e tests
  • GitHub Check: Analyze (go)

Comment on lines 338 to 347
if conf.Vlan != nil {
if err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS, sriovtypes.VlanProtoInt[*conf.VlanProto]); err != nil {
err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS, sriovtypes.VlanProtoInt[*conf.VlanProto])
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
}
if err != nil {
return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %s: %v", conf.VFID, *conf.Vlan, *conf.VlanQoS, *conf.VlanProto, err)
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Critical: PR does not implement stated objective - vlan=0 still attempts configuration.

The condition if conf.Vlan != nil (line 338) is true when conf.Vlan points to 0, so VLAN configuration is still attempted when vlan=0. According to the PR description and issue #392, the SR-IOV CNI specification defines vlan=0 as "VLAN disabled" and configuration should be skipped entirely to avoid failures on hardware that doesn't support it.

The current implementation adds a useful protocol fallback (retry without proto when "protocol not supported") but doesn't address the core issue: avoiding VLAN configuration when VLAN is disabled.

Expected behavior per PR objectives and reviewer feedback:

  • When *conf.Vlan == 0, skip VLAN configuration entirely (no netlink calls)
  • Only apply VLAN protocol and QoS when *conf.Vlan > 0
🔎 Proposed fix to skip vlan=0 configuration
 	// 1. Set vlan
 	if conf.Vlan != nil {
+		// Skip VLAN configuration when vlan=0 (VLAN disabled per spec)
+		if *conf.Vlan > 0 {
 		err = s.nLink.LinkSetVfVlanQosProto(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS, sriovtypes.VlanProtoInt[*conf.VlanProto])
 		if err != nil && strings.Contains(err.Error(), "protocol not supported") {
 			logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
 			err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
-	        }
+		}
 		if err != nil {
 			return fmt.Errorf("failed to set vf %d vlan configuration - id %d, qos %d and proto %s: %v", conf.VFID, *conf.Vlan, *conf.VlanQoS, *conf.VlanProto, err)
 		}
+		}
 	}

Note: This also fixes the indentation on line 343.

🧰 Tools
🪛 GitHub Actions: Go-static-analysis

[error] 343-343: File is not properly formatted (gofmt).

🪛 GitHub Check: Lint

[failure] 343-343:
File is not properly formatted (gofmt)

🤖 Prompt for AI Agents
In pkg/sriov/sriov.go around lines 338 to 347, the code currently tests only for
conf.Vlan != nil which still attempts VLAN configuration when conf.Vlan points
to 0; change the guard to require a non-nil pointer and a positive VLAN id
(conf.Vlan != nil && *conf.Vlan > 0) so all VLAN netlink calls are skipped when
vlan=0; keep the existing proto fallback logic and error handling intact, and
correct the indentation on the nested fallback block.

Comment on lines +340 to +343
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix indentation and consider more robust error detection.

Line 343 has mixed indentation (spaces instead of tabs), causing the gofmt failure.

Additionally, inspecting error strings with strings.Contains(err.Error(), "protocol not supported") is fragile - different drivers or kernel versions may return different error messages. Consider checking for specific netlink error codes if available.

🔎 Fix indentation
 		if err != nil && strings.Contains(err.Error(), "protocol not supported") {
 			logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
 			err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
-	        }
+		}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
}
if err != nil && strings.Contains(err.Error(), "protocol not supported") {
logging.Debug("VLAN proto not supported, retrying without proto", "vf", conf.VFID, "vlan", *conf.Vlan, "err", err)
err = s.nLink.LinkSetVfVlanQos(pfLink, conf.VFID, *conf.Vlan, *conf.VlanQoS)
}
🧰 Tools
🪛 GitHub Actions: Go-static-analysis

[error] 343-343: File is not properly formatted (gofmt).

🪛 GitHub Check: Lint

[failure] 343-343:
File is not properly formatted (gofmt)

Comment on lines +282 to +298
// LinkSetVfVlanQos provides a mock function with given fields: _a0, _a1, _a2, _a3
func (_m *NetlinkManager) LinkSetVfVlanQos( _a0 netlink.Link, _a1 int, _a2 int, _a3 int ) error {
ret := _m.Called(_a0, _a1, _a2, _a3)

if len(ret) == 0 {
panic("no return value specified for LinkSetVfVlanQos")
}

var r0 error
if rf, ok := ret.Get(0).(func(netlink.Link, int, int, int) error); ok {
r0 = rf(_a0, _a1, _a2, _a3)
} else {
r0 = ret.Error(0)
}

return r0
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Re-generate mock with proper formatting.

This auto-generated mock has inconsistent indentation (spaces instead of tabs on multiple lines). Since the file header warns "DO NOT EDIT", regenerate the mock using mockery with proper Go formatting instead of manually editing.

Run mockery to regenerate this file with correct formatting:

#!/bin/bash
# Regenerate the NetlinkManager mock with proper formatting
cd pkg/utils
mockery --name=NetlinkManager --output=mocks --outpkg=mocks
🤖 Prompt for AI Agents
In pkg/utils/mocks/netlink_manager_mock.go around lines 282 to 298 the
auto-generated mock contains inconsistent indentation (spaces instead of tabs);
do not manually edit—re-generate the mock using mockery in the pkg/utils
directory to produce properly formatted Go code and then run gofmt (or
goimports) over the generated file; e.g., cd into pkg/utils and run mockery
--name=NetlinkManager --output=mocks --outpkg=mocks, then run gofmt -w
./pkg/utils/mocks (or goimports -w) to enforce correct tabs/formatting before
committing.

@ashokpariya0 ashokpariya0 changed the title avoid vlan configuration when vlan is disabled. Add fallback for VF VLAN configuration when protocol is unsupported Jan 2, 2026
When configuring VLAN on a VF (both during pod setup and reset):
  - First attempt LinkSetVfVlanQosProto (with explicit protocol).
  - If the call fails with an error containing "protocol not supported",
    retry using LinkSetVfVlanQos (without protocol specification).

Signed-off-by: Ashok Pariya <ashok.pariya@ibm.com>
@ashokpariya0
Copy link
Copy Markdown
Contributor Author

Hi @ashokpariya0,

I don't think this one is right as in case the VF as a vlan configure by something (or even a left over) and the user doesn't want a vlan we will not reset the device.

maybe we can check if the vlan is 0 and the user want 0 then we don't do anything. but if there is a vlan and the user doesn't want vlan (vlan 0 no nil) we should apply the 0

Yes, so we see below error irrespective of vlan value is 0 or > 0 (more details are in #392)

failed to set vf 2 vlan configuration - id 0, qos 0 and proto 802.1q: protocol not supported

it seems protocol( 802.1q:) is not supported, so we can use method(https://github.com/vishvananda/netlink/blob/6e61cd407d1d66dd67fc66aaaf5c2a42e0614426/link_linux.go#L652)

when method(https://github.com/vishvananda/netlink/blob/6e61cd407d1d66dd67fc66aaaf5c2a42e0614426/link_linux.go#L683) fail,

I updated the PR desctiptions with new change, please check

@SchSeba
Copy link
Copy Markdown
Collaborator

SchSeba commented Feb 12, 2026

Hi @ashokpariya0 do we still need this PR after your PR in the netlink lib was merged?
should we just bump our package version?

@ashokpariya0
Copy link
Copy Markdown
Contributor Author

Hi @ashokpariya0 do we still need this PR after your PR in the netlink lib was merged? should we just bump our package version?

yeah sure, will do that.

@ashokpariya0
Copy link
Copy Markdown
Contributor Author

Hi @ashokpariya0 do we still need this PR after your PR in the netlink lib was merged? should we just bump our package version?

Requested a release for vishvananda/netlink a few weeks ago (issue: vishvananda/netlink#1166
), but there has been no response yet.

Should we wait longer for the release, or proceed with merging this PR?

@zeeke
Copy link
Copy Markdown
Member

zeeke commented Mar 5, 2026

Should we wait longer for the release, or proceed with merging this PR?

you can bump the github.com/vishvananda/netlink to a non release version, like for cni-log.

github.com/k8snetworkplumbingwg/cni-log v0.0.0-20230801160229-b6e062c9e0f2

@ashokpariya0
Copy link
Copy Markdown
Contributor Author

Thanks,
Raised new pr: #404

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SRIOV-CNI fails configuring VF VLAN 0 with proto 802.1q on mlx5 (ConnectX-6 Lx) although kernel accepts the same configuration manually

4 participants