ocp-sriov: fix spoof check verification to use VF index instead of MAC#1355
ocp-sriov: fix spoof check verification to use VF index instead of MAC#1355zhiqiangf wants to merge 4 commits intorh-ecosystem-edge:mainfrom
Conversation
verifySpoofCheck looked for the pod's static MAC in "ip link show <PF>" on the host. On Mellanox CX6-DX hardware the SR-IOV CNI sets the MAC only inside the container netns without propagating it to the host PF via "ip link set <PF> vf N mac", so the host-side VF shows 00:00:00:00:00:00 and the MAC-based lookup always fails. Fix: use the VF PCI address from the pod's network-status annotation to find the VF index by matching against the PF's virtfnN sysfs symlinks, then look up that specific VF line in "ip link show <PF>". This is hardware-agnostic and does not depend on MAC propagation. The MAC-based path is kept as a fallback for cases where the PCI address is not available in the network-status annotation. Fixes tests: 25959, 70820
|
Important Review skippedDraft detected. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Pro Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
📝 WalkthroughWalkthroughThe spoof-check verification logic refactored to prioritize PCI-address-based validation. New implementation resolves VF indices via PF symlinks and grepped output, with fallback to MAC-based checking. Replaces inline MAC-only approach with dedicated helper functions and node output utility. Changes
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/ocp/sriov/internal/sriovenv/sriovenv.go (1)
672-681: VF index range hardcoded to 0-31.The loop iterates through VF indices 0-31, supporting up to 32 VFs. While this covers most common SR-IOV configurations, some NICs support up to 64 or 128 VFs. If higher VF counts are needed in the future, consider making this configurable or increasing the upper bound.
Additionally, the shell command directly interpolates
interfaceNameandpciAddr. While these values originate from controlled sources (test config and CNI annotations), consider validating they contain only expected characters (alphanumeric, colons, dots, hyphens) for defense-in-depth.🔧 Optional: Add input validation
// Add validation helper func isValidPCIAddr(addr string) bool { // PCI addresses are formatted as XXXX:XX:XX.X (e.g., 0000:3b:00.0) matched, _ := regexp.MatchString(`^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-9a-fA-F]$`, addr) return matched }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go` around lines 672 - 681, The shell command assigned to cmd hardcodes the VF index loop 0-31 and directly interpolates interfaceName and pciAddr; change it to dynamically determine the upper bound (e.g., read /sys/class/net/<interface>/device/sriov_totalvfs or count /sys/class/net/<interface>/device/virtfn* entries) or make the max VF count configurable, and reconstruct the loop to iterate up to that computed value instead of 0-31 (keep the rest of the lookup logic that uses ip link show %s and greps "vf $IDX"); additionally add validation for interfaceName and pciAddr (e.g., allow only alphanumeric, colons, dots, hyphens and use a PCI regex like ^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-9a-fA-F]$) before building cmd to prevent injection.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go`:
- Around line 733-747: The nodeOutput function's bidirectional strings.HasPrefix
checks can false-match sibling hostnames; update nodeOutput to only accept
prefix matches when the prefix boundary is a dot or end-of-string (to handle
FQDN vs short-hostname), e.g., after finding no exact map key, when testing if
strings.HasPrefix(nodeName, host) ensure nodeName[len(host)] is '.' (or host
length equals nodeName length) and when testing strings.HasPrefix(host,
nodeName) ensure host[len(nodeName)] is '.' (or equal length); keep the existing
exact-match early return in nodeOutput and return "" if no
precise-or-dot-delimited prefix match is found.
---
Nitpick comments:
In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go`:
- Around line 672-681: The shell command assigned to cmd hardcodes the VF index
loop 0-31 and directly interpolates interfaceName and pciAddr; change it to
dynamically determine the upper bound (e.g., read
/sys/class/net/<interface>/device/sriov_totalvfs or count
/sys/class/net/<interface>/device/virtfn* entries) or make the max VF count
configurable, and reconstruct the loop to iterate up to that computed value
instead of 0-31 (keep the rest of the lookup logic that uses ip link show %s and
greps "vf $IDX"); additionally add validation for interfaceName and pciAddr
(e.g., allow only alphanumeric, colons, dots, hyphens and use a PCI regex like
^[0-9a-fA-F]{4}:[0-9a-fA-F]{2}:[0-9a-fA-F]{2}\.[0-9a-fA-F]$) before building cmd
to prevent injection.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Run ID: cfabb0e3-3ac4-49f2-8547-b453696a43a0
📒 Files selected for processing (1)
tests/ocp/sriov/internal/sriovenv/sriovenv.go
| // nodeOutput extracts the command output for nodeName from the ExecCmdWithStdout result map. | ||
| // Falls back to prefix matching to handle FQDN vs short-hostname mismatches. | ||
| func nodeOutput(outputMap map[string]string, nodeName string) string { | ||
| if out, ok := outputMap[nodeName]; ok { | ||
| return out | ||
| } | ||
|
|
||
| for host, out := range outputMap { | ||
| if strings.HasPrefix(nodeName, host) || strings.HasPrefix(host, nodeName) { | ||
| return out | ||
| } | ||
| } | ||
|
|
||
| return "" | ||
| } |
There was a problem hiding this comment.
Prefix matching may incorrectly match different nodes.
The bidirectional prefix check at line 741 could match unintended nodes when node names share common prefixes:
- For
nodeName = "worker-1"and map keys{"worker-10": "...", "worker-1-extra": "..."}, both would match viastrings.HasPrefix(host, nodeName). - Map iteration order is non-deterministic, so the result could be inconsistent.
The intent is to handle FQDN vs short hostname (e.g., worker-1 vs worker-1.cluster.local). Consider checking for a dot delimiter after the prefix match.
🛠️ Suggested fix
func nodeOutput(outputMap map[string]string, nodeName string) string {
if out, ok := outputMap[nodeName]; ok {
return out
}
for host, out := range outputMap {
- if strings.HasPrefix(nodeName, host) || strings.HasPrefix(host, nodeName) {
+ // Handle FQDN vs short hostname: one must be a prefix of the other
+ // followed by a dot (e.g., "worker-1" matches "worker-1.cluster.local")
+ if strings.HasPrefix(host, nodeName) && (len(host) == len(nodeName) || host[len(nodeName)] == '.') {
+ return out
+ }
+ if strings.HasPrefix(nodeName, host) && (len(nodeName) == len(host) || nodeName[len(host)] == '.') {
return out
}
}
return ""
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // nodeOutput extracts the command output for nodeName from the ExecCmdWithStdout result map. | |
| // Falls back to prefix matching to handle FQDN vs short-hostname mismatches. | |
| func nodeOutput(outputMap map[string]string, nodeName string) string { | |
| if out, ok := outputMap[nodeName]; ok { | |
| return out | |
| } | |
| for host, out := range outputMap { | |
| if strings.HasPrefix(nodeName, host) || strings.HasPrefix(host, nodeName) { | |
| return out | |
| } | |
| } | |
| return "" | |
| } | |
| // nodeOutput extracts the command output for nodeName from the ExecCmdWithStdout result map. | |
| // Falls back to prefix matching to handle FQDN vs short-hostname mismatches. | |
| func nodeOutput(outputMap map[string]string, nodeName string) string { | |
| if out, ok := outputMap[nodeName]; ok { | |
| return out | |
| } | |
| for host, out := range outputMap { | |
| // Handle FQDN vs short hostname: one must be a prefix of the other | |
| // followed by a dot (e.g., "worker-1" matches "worker-1.cluster.local") | |
| if strings.HasPrefix(host, nodeName) && (len(host) == len(nodeName) || host[len(nodeName)] == '.') { | |
| return out | |
| } | |
| if strings.HasPrefix(nodeName, host) && (len(nodeName) == len(host) || nodeName[len(host)] == '.') { | |
| return out | |
| } | |
| } | |
| return "" | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@tests/ocp/sriov/internal/sriovenv/sriovenv.go` around lines 733 - 747, The
nodeOutput function's bidirectional strings.HasPrefix checks can false-match
sibling hostnames; update nodeOutput to only accept prefix matches when the
prefix boundary is a dot or end-of-string (to handle FQDN vs short-hostname),
e.g., after finding no exact map key, when testing if
strings.HasPrefix(nodeName, host) ensure nodeName[len(host)] is '.' (or host
length equals nodeName length) and when testing strings.HasPrefix(host,
nodeName) ensure host[len(nodeName)] is '.' (or equal length); keep the existing
exact-match early return in nodeOutput and return "" if no
precise-or-dot-delimited prefix match is found.
… firmware reset The sriov-network-operator Mellanox plugin resets firmware to SRIOV_EN=False/NUM_OF_VFS=0 for any NIC with no active policy spec. Every test cleanup triggers this write; the next MCO warm reboot activates it, leaving sriov_totalvfs=0 for subsequent runs. Setting externallyManaged=true tells the plugin to write sriov_numvfs directly without touching firmware, so cleanup never resets SRIOV_EN. Requires firmware pre-configured (SRIOV_EN=True, NUM_OF_VFS=6) via one-time cold boot before tests run. In metricsExporter.go the flag is conditioned on nicVendor==MlxVendorID so non-Mellanox devices are unaffected. Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
… prevent firmware reset" This reverts commit 93218dd.
Problem
`verifySpoofCheck` identifies the pod's VF in `ip link show ` by looking for a line containing the pod's static MAC address. On Mellanox CX6-DX hardware, the SR-IOV CNI sets the MAC only inside the container's network namespace (via `ip link set net1 address `) without propagating it to the host PF via `ip link set vf N mac`. As a result, the host-side VF line shows `00:00:00:00:00:00` and the MAC-based lookup never finds the expected spoof check state, causing tests 25959 and 70820 to fail consistently on this hardware.
Fix
Use the VF PCI address from the pod's `k8s.v1.cni.cncf.io/network-status` annotation to find the VF's index by matching against the PF's `virtfnN` sysfs symlinks. Then look up that specific `vf N` line in `ip link show ` to verify the spoof check state. This approach is hardware-agnostic and independent of whether the CNI propagates the MAC to the host PF.
The MAC-based path is kept as a fallback for environments where the PCI address is not available in the network-status annotation.
Changes
Tests fixed
Summary by CodeRabbit