-
Notifications
You must be signed in to change notification settings - Fork 13
Description
Certain kernel modules like irdma will cause the installer to fail on the openibd restart here . This is particularly problematic in certain environments where some mellanox NICs are used in Ethernet mode for regular traffic. When the restart fails with Function: generate_ofed_modules_blacklist Unloading ib_uverbs [FAILED] rmmod: ERROR: Module ib_uverbs is in use by: irdma [16-Jul-25_16:25:49] Command "/etc/init.d/openibd restart" failed with exit code: 1 , it will unclaim the ethernet NICs causing the node to lose network connectivity and leaving the node in an unrecoverable state.
I'm proposing on introducing a custom environment variable CUSTOM_UNLOAD_MODULES where a list of kernel that need to be unloaded before the restart can be passed to the script.
The Implementation can be similar to UNLOAD_STORAGE_MODULES where this list is appended to UNLOAD_MODULES and let openibd deal with it. See: https://github.com/Mellanox/doca-driver-build/blob/main/entrypoint.sh#L472
If this approach sounds reasonable, I have a PR ready. Thanks!