Skip to content

ENG-19808 mlxfwreset before reboot in SR-IOV operator#8

Merged
punkerpunker merged 2 commits intoclark/config-daemon-ib-unbind-fixfrom
gleb/ENG-19808-mlxfwreset-before-reboot
Feb 20, 2025
Merged

ENG-19808 mlxfwreset before reboot in SR-IOV operator#8
punkerpunker merged 2 commits intoclark/config-daemon-ib-unbind-fixfrom
gleb/ENG-19808-mlxfwreset-before-reboot

Conversation

@punkerpunker
Copy link

Well, overall it seems like something wrong on the firmware level enabling the SR-IOV capability, from the kernel perspective things looks okay to me

I see in the sriov-network-operator there's an option to enable mstfwreset after mlxconfig change! I guess that's something that we can try leveraging:
https://github.com/openshift/sriov-network-operator/blob/79cb3c6ae721220754189300539a38c63e38e66c/pkg/plugins/mellanox/mellanox_plugin.go#L215

I think this is going to resolve the reboot loop we're getting when running config-daemon, and I'll try doing it tomorrow.
All paths so far goes to firmware, I don't think I'll find anything more in the kernel, tbh

Infra PR to enable featureGate - https://github.com/togethercomputer/infra/pull/4044

@github-actions
Copy link

Thanks for your PR,
To run vendors CIs, Maintainers can use one of:

  • /test-all: To run all tests for all vendors.
  • /test-e2e-all: To run all E2E tests for all vendors.
  • /test-e2e-nvidia-all: To run all E2E tests for NVIDIA vendor.

To skip the vendors CIs, Maintainers can use one of:

  • /skip-all: To skip all tests for all vendors.
  • /skip-e2e-all: To skip all E2E tests for all vendors.
  • /skip-e2e-nvidia-all: To skip all E2E tests for NVIDIA vendor.
    Best regards.

@punkerpunker punkerpunker merged commit 1fb8405 into clark/config-daemon-ib-unbind-fix Feb 20, 2025
15 of 21 checks passed
clarkzinzow added a commit that referenced this pull request Aug 28, 2025
Add platform build arg.

Comment out Mellanox plugin's draining + rebooting for totalVfs + SRIOV_EN configs, which is buggy.

Scan GUIDs (#7)

* Squash commits into one

* Squash commits into one

* Cherry-picked types (build fix)

* merge issues fix

* read GUID from sysfs

* node -> port

* args mismatch :(

* NAD config fix to include guid

* port -> node back

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* fix

* bring back pKey to netAttDef definition

* pkey proper location

* removed excessive log lines

* quotes fix

ENG-21048 - KernelArgIommuOn instead of KernelArgIommuPt (to enable ATS & ACS) (#9)

KernelArgIommuOn instead of KernelArgIommuPt

ENG-19808 mlxfwreset before reboot in SR-IOV operator (#8)

* bring reboots back

* bring reboots back

GUIDSavedInUFM config parameter added

camelcase -> snake case
clarkzinzow added a commit that referenced this pull request Aug 28, 2025
Upgrade golangci-lint to work with Go 1.23

Add platform build arg.

Comment out Mellanox plugin's draining + rebooting for totalVfs + SRIOV_EN configs, which is buggy.

Scan GUIDs (#7)

* Squash commits into one

* Squash commits into one

* Cherry-picked types (build fix)

* merge issues fix

* read GUID from sysfs

* node -> port

* args mismatch :(

* NAD config fix to include guid

* port -> node back

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* rollback partially

* fix

* bring back pKey to netAttDef definition

* pkey proper location

* removed excessive log lines

* quotes fix

ENG-21048 - KernelArgIommuOn instead of KernelArgIommuPt (to enable ATS & ACS) (#9)

KernelArgIommuOn instead of KernelArgIommuPt

ENG-19808 mlxfwreset before reboot in SR-IOV operator (#8)

* bring reboots back

* bring reboots back

GUIDSavedInUFM config parameter added

camelcase -> snake case
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants