Skip to content

Commit 3f8314f

Browse files
gadididimergify[bot]
authored andcommitted
nvmeof: fix CSI node plugin crash on immutable Linux distributions
On distributions like Talos Linux, NVMe modules are compiled directly into the kernel (CONFIG_NVME_TCP=y, CONFIG_NVME_FABRICS=y) instead of being loadable .ko files. Explicitly calling modprobe on nvme_fabrics fails on these systems since there is no .ko file present, causing the CSI node plugin to crash on startup. Removing nvme_fabrics from the module load list is safe because it is always a dependency of nvme_tcp. On normal distributions modprobe loads it automatically as part of the nvme_tcp dependency chain. On immutable distributions it is already baked into the kernel. We verify the fabrics framework is functional after loading nvme_tcp by checking that /dev/nvme-fabrics exists. This device node is created by the kernel on init regardless of whether nvme_fabrics was loaded as a module or compiled in, making it a reliable indicator that NVMe-oF TCP is ready to use. Signed-off-by: gadi-didi <gadi.didi@ibm.com> (cherry picked from commit 992d720)
1 parent b94b497 commit 3f8314f

File tree

1 file changed

+18
-11
lines changed

1 file changed

+18
-11
lines changed

internal/nvmeof/nvmeof_initiator.go

Lines changed: 18 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -125,20 +125,27 @@ func NewNVMeInitiator() NVMeInitiator {
125125

126126
// LoadKernelModules ensures required kernel modules are loaded.
127127
func (ni *nvmeInitiator) LoadKernelModules(ctx context.Context) error {
128-
modules := []string{
129-
"nvme_tcp",
130-
"nvme_fabrics",
131-
}
132-
log.DebugLog(ctx, "Loading NVMe-oF kernel modules: %s, and %s", modules[0], modules[1])
128+
module := "nvme_tcp"
129+
log.DebugLog(ctx, "Loading NVMe-oF kernel module: %s", module)
133130

134-
for _, module := range modules {
135-
err := kmod.Modprobe(ctx, module)
136-
if err != nil {
137-
return fmt.Errorf("failed to load kernel module %q: %w", module, err)
138-
}
131+
err := kmod.Modprobe(ctx, module)
132+
if err != nil {
133+
return fmt.Errorf("failed to load kernel module %q: %w", module, err)
139134
}
135+
log.DebugLog(ctx, "NVMe-oF kernel module: %s, is loaded successfully", module)
136+
// verify nvme_fabrics is functional by checking its device node.
137+
// On immutable Operating Systems like Talos Linux (CONFIG_NVME_FABRICS=y), the module
138+
// is baked into the kernel and /sys/module/nvme_fabrics may not exist,
139+
// but /dev/nvme-fabrics is always created by the kernel on init if the
140+
// fabrics framework is operational.
141+
if _, err := os.Stat("/dev/nvme-fabrics"); err != nil {
142+
if os.IsNotExist(err) {
143+
return fmt.Errorf("nvme_fabrics is not functional, /dev/nvme-fabrics not found: %w", err)
144+
}
140145

141-
log.DebugLog(ctx, "All NVMe-oF kernel modules: %s, and %s, loaded successfully", modules[0], modules[1])
146+
return fmt.Errorf("nvme_fabrics is not functional, failed to stat /dev/nvme-fabrics: %w", err)
147+
}
148+
log.DebugLog(ctx, "NVMe-oF fabrics framework is functional with /dev/nvme-fabrics present")
142149

143150
return nil
144151
}

0 commit comments

Comments
 (0)