Skip to content

feat: Go backup restore VF switchdev#145

Merged
rollandf merged 1 commit intoMellanox:mainfrom
rollandf:switchdev
Nov 26, 2025
Merged

feat: Go backup restore VF switchdev#145
rollandf merged 1 commit intoMellanox:mainfrom
rollandf:switchdev

Conversation

@rollandf
Copy link
Member

No description provided.

@coveralls
Copy link

coveralls commented Sep 29, 2025

Pull Request Test Coverage Report for Build 19696076708

Details

  • 38 of 277 (13.72%) changed or added relevant lines in 1 file are covered.
  • 1 unchanged line in 1 file lost coverage.
  • Overall coverage decreased (-1.9%) to 61.635%

Changes Missing Coverage Covered Lines Changed/Added Lines %
entrypoint/internal/netconfig/netconfig.go 38 277 13.72%
Files with Coverage Reduction New Missed Lines %
entrypoint/internal/netconfig/netconfig.go 1 35.49%
Totals Coverage Status
Change from base Build 19695900040: -1.9%
Covered Lines: 1764
Relevant Lines: 2862

💛 - Coveralls

@greptile-apps
Copy link

greptile-apps bot commented Nov 25, 2025

Greptile Overview

Greptile Summary

This PR extends the network configuration backup/restore system to support switchdev mode for VF representors, implementing the Go equivalent of existing bash script functionality.

Key Changes:

  • Added Representor struct to track switchdev representor devices (name, MTU, admin state, physical IDs)
  • Implemented representor discovery during save phase by scanning /sys/class/net for devices matching pf{port}vf{id} pattern
  • Modified VF restore logic to skip rebinding VFs in switchdev mode (deferred until after eswitch mode is set)
  • Added representor restoration that finds devices by physical attributes, renames if needed, and restores MTU/state
  • Added comprehensive tests for switchdev helper functions and VF rebinding logic

Issues Found:

  • Critical: Bind delay is 3 seconds but bash script uses 4 seconds (commit 3ce9ba9), creating inconsistency
  • Regex compilation happens in hot loops instead of using package-level compiled regex
  • Previous thread noted concerns about bind delay impact on performance

Confidence Score: 3/5

  • This PR has critical bind delay inconsistency with bash script that could cause VF initialization failures
  • The bind delay mismatch (3s vs 4s) is a logic error that could cause real issues in production since the bash script was specifically updated to use 4s for stability. The regex performance issue is a style concern but won't cause failures
  • Pay attention to entrypoint/internal/netconfig/netconfig.go lines 401 and 496 - bind delay must match bash script

Important Files Changed

File Analysis

Filename Score Overview
entrypoint/internal/netconfig/netconfig.go 3/5 Adds switchdev representor backup/restore functionality with VF rebinding logic. Issues: bind delay mismatch with bash script (3s vs 4s), regex compiled in hot loops
entrypoint/internal/netconfig/netconfig_test.go 5/5 Adds tests for switchdev helper functions and VF rebinding logic, covering both switchdev and legacy modes

Sequence Diagram

sequenceDiagram
    participant User
    participant NetConfig
    participant Sysfs as /sys/class/net
    participant Netlink
    participant Driver as mlx5_core

    Note over User,Driver: Save Phase (Switchdev Mode)
    User->>NetConfig: Save()
    NetConfig->>NetConfig: discoverSwitchdevRepresentors()
    NetConfig->>Sysfs: Read phys_port_name & phys_switch_id
    NetConfig->>Sysfs: Scan subsystem for representors
    NetConfig->>Netlink: Get representor config (MTU, admin state)
    NetConfig-->>User: Configuration saved

    Note over User,Driver: Restore Phase (Switchdev Mode)
    User->>NetConfig: Restore()
    NetConfig->>NetConfig: restoreVFConfigurations(switchdev)
    NetConfig->>Driver: Unbind VF from driver
    Note over NetConfig: Skip rebind for switchdev (handled later)
    
    NetConfig->>NetConfig: Set eswitch mode to switchdev
    
    NetConfig->>NetConfig: rebindVFsInSwitchdevMode()
    NetConfig->>Driver: Bind VF to driver
    Note over NetConfig: Wait 3s bind delay
    NetConfig->>Netlink: Restore VF MTU & admin state
    
    NetConfig->>NetConfig: restoreRepresentors()
    NetConfig->>Sysfs: Find current representor by phys attributes
    NetConfig->>NetConfig: Rename representor if needed
    NetConfig->>Netlink: Set representor MTU & admin state
    NetConfig-->>User: Restore complete
Loading

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

16 files reviewed, no comments

Edit Code Review Agent Settings | Greptile

Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Additional Comments (2)

  1. entrypoint/internal/netconfig/netconfig_test.go, line 102-123 (link)

    syntax: duplicate test case - this exact test already exists at lines 112-123

  2. entrypoint/internal/netconfig/netconfig_test.go, line 223-255 (link)

    syntax: duplicate test case - this exact test already exists at lines 240-255

4 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

// Wait for bind delay (matches bash script)
time.Sleep(1 * time.Second) // BIND_DELAY_SEC equivalent
// Wait for bind delay (matches bash script)
time.Sleep(3 * time.Second) // BIND_DELAY_SEC equivalent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Check that the bind delay increase from 1s to 3s doesn't negatively impact restore performance for devices with many VFs


// Wait for bind delay (matches bash script)
time.Sleep(1 * time.Second) // BIND_DELAY_SEC equivalent
time.Sleep(3 * time.Second) // BIND_DELAY_SEC equivalent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Check that this bind delay matches the bash script value and is necessary for switchdev mode stability

Signed-off-by: Fred Rolland <frolland@nvidia.com>
Copy link

@greptile-apps greptile-apps bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 files reviewed, 4 comments

Edit Code Review Agent Settings | Greptile

// isRepresentorPhysPortName checks if a phys_port_name indicates a representor
func (n *netconfig) isRepresentorPhysPortName(physPortName string) bool {
// Format: "pf{port_num}vf{vf_id}" (e.g., "pf1vf3")
re := regexp.MustCompile(`^pf(\d+)vf(\d+)$`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Regex is compiled on every call in a hot loop (called for every network device in findDeviceRepresentors). Consider compiling once at package level:

Suggested change
re := regexp.MustCompile(`^pf(\d+)vf(\d+)$`)
var representorRegex = regexp.MustCompile(`^pf(\d+)vf(\d+)$`)

Then use representorRegex.MatchString(physPortName) instead of recompiling each time


// parseRepresentorPhysPortName parses representor phys_port_name to extract PF port and VF ID
func (n *netconfig) parseRepresentorPhysPortName(physPortName string) (pfPortNum, vfID string, err error) {
re := regexp.MustCompile(`^pf(\d+)vf(\d+)$`)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style: Same regex compiled twice - once here and once in isRepresentorPhysPortName:1350. Use a single package-level compiled regex instead

// Wait for bind delay (matches bash script)
time.Sleep(1 * time.Second) // BIND_DELAY_SEC equivalent
// Wait for bind delay (matches bash script)
time.Sleep(3 * time.Second) // BIND_DELAY_SEC equivalent
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Bind delay is 3s but the bash script uses BIND_DELAY_SEC=4 (see commit 3ce9ba9). This inconsistency could cause issues

Suggested change
time.Sleep(3 * time.Second) // BIND_DELAY_SEC equivalent
time.Sleep(4 * time.Second) // BIND_DELAY_SEC equivalent (matches bash script default)

@@ -460,7 +494,7 @@ func (n *netconfig) rebindVFsInSwitchdevMode(ctx context.Context, device *Mellan
}

// Wait for bind delay (matches bash script)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

logic: Same issue - bind delay is 3s but bash script uses 4s (see commit 3ce9ba9)

Suggested change
// Wait for bind delay (matches bash script)
time.Sleep(4 * time.Second) // BIND_DELAY_SEC equivalent (matches bash script default)

@rollandf rollandf merged commit 0a5362e into Mellanox:main Nov 26, 2025
4 of 5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants