Skip to content

ocp-sriov: fix spoof check verification to use VF index instead of MAC#1355

Draft
zhiqiangf wants to merge 4 commits intorh-ecosystem-edge:mainfrom
zhiqiangf:fix-spoofcheck-vf-index
Draft

ocp-sriov: fix spoof check verification to use VF index instead of MAC#1355
zhiqiangf wants to merge 4 commits intorh-ecosystem-edge:mainfrom
zhiqiangf:fix-spoofcheck-vf-index

Conversation

@zhiqiangf
Copy link
Copy Markdown
Contributor

@zhiqiangf zhiqiangf commented Apr 29, 2026

Problem

`verifySpoofCheck` identifies the pod's VF in `ip link show ` by looking for a line containing the pod's static MAC address. On Mellanox CX6-DX hardware, the SR-IOV CNI sets the MAC only inside the container's network namespace (via `ip link set net1 address `) without propagating it to the host PF via `ip link set vf N mac`. As a result, the host-side VF line shows `00:00:00:00:00:00` and the MAC-based lookup never finds the expected spoof check state, causing tests 25959 and 70820 to fail consistently on this hardware.

Fix

Use the VF PCI address from the pod's `k8s.v1.cni.cncf.io/network-status` annotation to find the VF's index by matching against the PF's `virtfnN` sysfs symlinks. Then look up that specific `vf N` line in `ip link show ` to verify the spoof check state. This approach is hardware-agnostic and independent of whether the CNI propagates the MAC to the host PF.

The MAC-based path is kept as a fallback for environments where the PCI address is not available in the network-status annotation.

Changes

  • Refactored `verifySpoofCheck` into three functions:
    • `verifySpoofCheck` — entry point, selects PCI or MAC strategy
    • `verifySpoofCheckByPCI` — primary: resolves VF index from PCI address via `virtfnN` symlinks, checks specific VF line
    • `verifySpoofCheckByMAC` — fallback: existing MAC-matching behavior
  • Extracted `nodeOutput` helper to deduplicate FQDN/short-hostname lookup logic

Tests fixed

  • 25959: SR-IOV VF with spoof checking enabled
  • 70820: SR-IOV VF with spoof checking disabled

Summary by CodeRabbit

  • Tests
    • Strengthened network configuration verification logic with enhanced validation approach and dedicated helper utilities for more reliable and consistent test execution.

verifySpoofCheck looked for the pod's static MAC in "ip link show <PF>"
on the host. On Mellanox CX6-DX hardware the SR-IOV CNI sets the MAC
only inside the container netns without propagating it to the host PF
via "ip link set <PF> vf N mac", so the host-side VF shows
00:00:00:00:00:00 and the MAC-based lookup always fails.

Fix: use the VF PCI address from the pod's network-status annotation to
find the VF index by matching against the PF's virtfnN sysfs symlinks,
then look up that specific VF line in "ip link show <PF>". This is
hardware-agnostic and does not depend on MAC propagation.

The MAC-based path is kept as a fallback for cases where the PCI address
is not available in the network-status annotation.

Fixes tests: 25959, 70820
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 29, 2026

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 2f4a5da5-534c-43db-af64-e24d4682fbcd

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
📝 Walkthrough

Walkthrough

The spoof-check verification logic refactored to prioritize PCI-address-based validation. New implementation resolves VF indices via PF symlinks and grepped output, with fallback to MAC-based checking. Replaces inline MAC-only approach with dedicated helper functions and node output utility.

Changes

Cohort / File(s) Summary
Spoof-check verification refactor
tests/ocp/sriov/internal/sriovenv/sriovenv.go
Refactored verifySpoofCheck to use PCI-address-based validation as primary approach. Adds two new helper functions and nodeOutput utility for robust per-node command output extraction. Removes previous inline MAC-only implementation and replaces with fallback-capable logic (+78/-18 lines).

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: fixing spoof check verification by switching from MAC-based to VF index-based detection using PCI addresses.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@zhiqiangf zhiqiangf marked this pull request as draft April 29, 2026 01:14
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)

672-681: VF index range hardcoded to 0-31.

The loop iterates through VF indices 0-31, supporting up to 32 VFs. While this covers most common SR-IOV configurations, some NICs support up to 64 or 128 VFs. If higher VF counts are needed in the future, consider making this configurable or increasing the upper bound.

Additionally, the shell command directly interpolates interfaceName and pciAddr. While these values originate from controlled sources (test config and CNI annotations), consider validating they contain only expected characters (alphanumeric, colons, dots, hyphens) for defense-in-depth.

🔧 Optional: Add input validation
// Add validation helper
func isValidPCIAddr(addr string) bool {
	// PCI addresses are formatted as XXXX:XX:XX.X (e.g., 0000:3b:00.0)
	matched, _ := regexp.MatchString(`^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-9a-fA-F]$`, addr)
	return matched
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go` around lines 672 - 681, The
shell command assigned to cmd hardcodes the VF index loop 0-31 and directly
interpolates interfaceName and pciAddr; change it to dynamically determine the
upper bound (e.g., read /sys/class/net/<interface>/device/sriov_totalvfs or
count /sys/class/net/<interface>/device/virtfn* entries) or make the max VF
count configurable, and reconstruct the loop to iterate up to that computed
value instead of 0-31 (keep the rest of the lookup logic that uses ip link show
%s and greps "vf $IDX"); additionally add validation for interfaceName and
pciAddr (e.g., allow only alphanumeric, colons, dots, hyphens and use a PCI
regex like ^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-9a-fA-F]$) before
building cmd to prevent injection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go`:
- Around line 733-747: The nodeOutput function's bidirectional strings.HasPrefix
checks can false-match sibling hostnames; update nodeOutput to only accept
prefix matches when the prefix boundary is a dot or end-of-string (to handle
FQDN vs short-hostname), e.g., after finding no exact map key, when testing if
strings.HasPrefix(nodeName, host) ensure nodeName[len(host)] is '.' (or host
length equals nodeName length) and when testing strings.HasPrefix(host,
nodeName) ensure host[len(nodeName)] is '.' (or equal length); keep the existing
exact-match early return in nodeOutput and return "" if no
precise-or-dot-delimited prefix match is found.

---

Nitpick comments:
In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go`:
- Around line 672-681: The shell command assigned to cmd hardcodes the VF index
loop 0-31 and directly interpolates interfaceName and pciAddr; change it to
dynamically determine the upper bound (e.g., read
/sys/class/net/<interface>/device/sriov_totalvfs or count
/sys/class/net/<interface>/device/virtfn* entries) or make the max VF count
configurable, and reconstruct the loop to iterate up to that computed value
instead of 0-31 (keep the rest of the lookup logic that uses ip link show %s and
greps "vf $IDX"); additionally add validation for interfaceName and pciAddr
(e.g., allow only alphanumeric, colons, dots, hyphens and use a PCI regex like
^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-9a-fA-F]$) before building cmd
to prevent injection.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: cfabb0e3-3ac4-49f2-8547-b453696a43a0

📥 Commits

Reviewing files that changed from the base of the PR and between 225f998 and f086798.

📒 Files selected for processing (1)
  • tests/ocp/sriov/internal/sriovenv/sriovenv.go

Comment on lines +733 to +747
// nodeOutput extracts the command output for nodeName from the ExecCmdWithStdout result map.
// Falls back to prefix matching to handle FQDN vs short-hostname mismatches.
func nodeOutput(outputMap map[string]string, nodeName string) string {
if out, ok := outputMap[nodeName]; ok {
return out
}

for host, out := range outputMap {
if strings.HasPrefix(nodeName, host) || strings.HasPrefix(host, nodeName) {
return out
}
}

return ""
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Prefix matching may incorrectly match different nodes.

The bidirectional prefix check at line 741 could match unintended nodes when node names share common prefixes:

  • For nodeName = "worker-1" and map keys {"worker-10": "...", "worker-1-extra": "..."}, both would match via strings.HasPrefix(host, nodeName).
  • Map iteration order is non-deterministic, so the result could be inconsistent.

The intent is to handle FQDN vs short hostname (e.g., worker-1 vs worker-1.cluster.local). Consider checking for a dot delimiter after the prefix match.

🛠️ Suggested fix
 func nodeOutput(outputMap map[string]string, nodeName string) string {
 	if out, ok := outputMap[nodeName]; ok {
 		return out
 	}

 	for host, out := range outputMap {
-		if strings.HasPrefix(nodeName, host) || strings.HasPrefix(host, nodeName) {
+		// Handle FQDN vs short hostname: one must be a prefix of the other
+		// followed by a dot (e.g., "worker-1" matches "worker-1.cluster.local")
+		if strings.HasPrefix(host, nodeName) && (len(host) == len(nodeName) || host[len(nodeName)] == '.') {
+			return out
+		}
+		if strings.HasPrefix(nodeName, host) && (len(nodeName) == len(host) || nodeName[len(host)] == '.') {
 			return out
 		}
 	}

 	return ""
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// nodeOutput extracts the command output for nodeName from the ExecCmdWithStdout result map.
// Falls back to prefix matching to handle FQDN vs short-hostname mismatches.
func nodeOutput(outputMap map[string]string, nodeName string) string {
if out, ok := outputMap[nodeName]; ok {
return out
}
for host, out := range outputMap {
if strings.HasPrefix(nodeName, host) || strings.HasPrefix(host, nodeName) {
return out
}
}
return ""
}
// nodeOutput extracts the command output for nodeName from the ExecCmdWithStdout result map.
// Falls back to prefix matching to handle FQDN vs short-hostname mismatches.
func nodeOutput(outputMap map[string]string, nodeName string) string {
if out, ok := outputMap[nodeName]; ok {
return out
}
for host, out := range outputMap {
// Handle FQDN vs short hostname: one must be a prefix of the other
// followed by a dot (e.g., "worker-1" matches "worker-1.cluster.local")
if strings.HasPrefix(host, nodeName) && (len(host) == len(nodeName) || host[len(nodeName)] == '.') {
return out
}
if strings.HasPrefix(nodeName, host) && (len(nodeName) == len(host) || nodeName[len(host)] == '.') {
return out
}
}
return ""
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go` around lines 733 - 747, The
nodeOutput function's bidirectional strings.HasPrefix checks can false-match
sibling hostnames; update nodeOutput to only accept prefix matches when the
prefix boundary is a dot or end-of-string (to handle FQDN vs short-hostname),
e.g., after finding no exact map key, when testing if
strings.HasPrefix(nodeName, host) ensure nodeName[len(host)] is '.' (or host
length equals nodeName length) and when testing strings.HasPrefix(host,
nodeName) ensure host[len(nodeName)] is '.' (or equal length); keep the existing
exact-match early return in nodeOutput and return "" if no
precise-or-dot-delimited prefix match is found.

zhiqiangf and others added 3 commits April 29, 2026 14:43
… firmware reset

The sriov-network-operator Mellanox plugin resets firmware to
SRIOV_EN=False/NUM_OF_VFS=0 for any NIC with no active policy spec.
Every test cleanup triggers this write; the next MCO warm reboot
activates it, leaving sriov_totalvfs=0 for subsequent runs.

Setting externallyManaged=true tells the plugin to write sriov_numvfs
directly without touching firmware, so cleanup never resets SRIOV_EN.
Requires firmware pre-configured (SRIOV_EN=True, NUM_OF_VFS=6) via
one-time cold boot before tests run.

In metricsExporter.go the flag is conditioned on nicVendor==MlxVendorID
so non-Mellanox devices are unaffected.

Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant