support-bundle.zip
Environment
- Talos Linux: v1.13.4
- Kubernetes: v1.36.1
- Extension: siderolabs/multipath-tools v0.1.0
- Hardware: Dell PowerEdge R660 (3 nodes: 1 control plane + 2 workers)
- HBA: Emulex LPe35002-M2-D 2-Port 32Gb Fibre Channel Adapter
- Storage: Pure Storage FA-C70R5, Purity//FA 6.10.4
- Connectivity: Fibre Channel with LACP bonding (4x Broadcom 25Gb ports per node for Ethernet, separate FC fabric)
- Zoning: verified redundant (Pure GUI shows "Redundant" status, 2 WWN per host, paths on CT0 and CT1)
Issue 1: multipathd segfaults in ld-musl-x86_64.so.1 when FC paths are present
After approximately 60-90 seconds of runtime, multipathd crashes with a segfault consistently reproducible across all three nodes. The crash occurs in the dynamic linker of musl libc regardless of the path_checker used (tested with both directio and tur).
Kernel log output (same pattern on all nodes):
multipathd[20353]: segfault at 7f8512486b38 ip 00007f85127c5b8b sp 00007ffe9843e780 error 4 in ld-musl-x86_64.so.1[60b8b,7f8512779000+58000] likely on CPU 89 (core 7, socket 1)
The service restarts automatically and the crash repeats indefinitely. During the brief window before the crash, multipathd is visible and running, but all four SCSI paths remain in orphan state and no dm device is ever created.
multipathd show paths output (captured before crash):
hcil dev dev_t pri dm_st chk_st dev_st next_check
12:0:0:1 sda 8:0 50 undef undef unknown orphan
12:0:1:1 sdb 8:16 50 undef undef unknown orphan
13:0:0:1 sdc 8:32 50 undef undef unknown orphan
13:0:1:1 sdd 8:48 50 undef undef unknown orphan
The kernel detects all four paths correctly at boot (two per HBA port, two HBA ports per node):
scsi 13:0:0:1: Direct-Access PURE FlashArray 8888 PQ: 0 ANSI: 6
scsi 13:0:1:1: Direct-Access PURE FlashArray 8888 PQ: 0 ANSI: 6
scsi 14:0:0:1: Direct-Access PURE FlashArray 8888 PQ: 0 ANSI: 6
scsi 14:0:1:1: Direct-Access PURE FlashArray 8888 PQ: 0 ANSI: 6
udev correctly populates all relevant attributes on each device:
DEVTYPE=disk
SUBSYSTEM=block
DM_MULTIPATH_DEVICE_PATH=1
ID_SCSI=1
ID_SERIAL=3624a93707e521182588644d300011b2d
ID_WWN=0x624a93707e521182
ID_WWN_VENDOR_EXTENSION=0x588644d300011b2d
ID_WWN_WITH_EXTENSION=0x624a93707e521182588644d300011b2d
ID_SCSI_SERIAL=7E521182588644D300011B2D
Issue 2: ExtensionServiceConfig creates configFile with 0 bytes
When providing a custom multipath.conf via ExtensionServiceConfig, the file is created in the container overlay at the correct path but its content is never written. The file is consistently 0 bytes.
Observed filesystem state inside the overlay:
-rw-r--r-- 1 root root 0 Jun 15 07:33 multipath.conf
The ExtensionServiceConfig spec is correctly stored in Talos (verified via talosctl get extensionserviceconfigs -o yaml) and contains the full configuration, but it does not reach the file on disk.
Workaround applied: a privileged DaemonSet with an initContainer writes the configuration directly to the overlay path /system/overlays/usr-local-lib-containers-multipathd-diff/etc/multipath/multipath.conf. After this workaround, multipathd show config local confirms the custom configuration is being read. However, the segfault (Issue 1) persists regardless of the configuration provided.
Issue 3: ExtensionServiceConfig name mismatch documentation
The correct name for the ExtensionServiceConfig must be multipathd, not multipath. Using name: multipath causes Talos to store the configuration without error but the extension service never picks it up. This took significant time to diagnose and is not documented anywhere. A note in the extension README would help future users.
Configuration used
ExtensionServiceConfig:
apiVersion: v1alpha1
kind: ExtensionServiceConfig
name: multipathd
configFiles:
- content: |
defaults {
polling_interval 5
path_grouping_policy multibus
uid_attribute ID_WWN_WITH_EXTENSION
failback immediate
no_path_retry 0
user_friendly_names no
find_multipaths no
}
blacklist {
devnode "^nvme0n1$"
devnode "^sr[0-9]*"
devnode "^nbd[0-9]*"
}
devices {
device {
vendor "PURE"
product "FlashArray"
path_selector "service-time 0"
path_grouping_policy multibus
path_checker tur
fast_io_fail_tmo 10
dev_loss_tmo 60
no_path_retry 0
failback immediate
}
}
mountPath: /etc/multipath/multipath.conf
machine.udev rules (added to ensure udev notifies multipathd of block devices):
machine:
udev:
rules:
- SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", ENV{ID_WWN}!="", ENV{DM_MULTIPATH_DEVICE_PATH}="1"
- ACTION=="add|change", SUBSYSTEM=="block", ENV{DEVTYPE}=="disk", ENV{ID_WWN}!="", RUN+="/sbin/multipath -v 0 -r"
- ACTION=="add", SUBSYSTEM=="scsi", ENV{DEVTYPE}=="scsi_device", RUN+="/sbin/multipath -v 0"
machine.kernel.modules:
machine:
kernel:
modules:
- name: dm_multipath
Additional findings
The extension container runs as an overlay at /usr/local/lib/containers/multipathd/ with its writable layer at /system/overlays/usr-local-lib-containers-multipathd-diff/. The mountPath in configFiles appears to place files under /etc/multipath/ (subdirectory) rather than /etc/ (root). The correct mountPath to reach the location multipathd reads is /etc/multipath/multipath.conf, not /etc/multipath.conf.
The extension was also tested on Talos v1.13.3 where the segfault was more aggressive (occurring immediately on startup, causing a rapid restart loop). On v1.13.4 the service runs for approximately 60-90 seconds before crashing, suggesting a partial improvement but the underlying issue remains.
Expected behavior
multipathd should run stably, group the four FC paths into a single dm device, and expose it at /dev/disk/by-id/wwn-0x624a93707e521182588644d300011b2d pointing to a dm device rather than a raw SCSI disk.
Actual behavior
multipathd crashes repeatedly with a segfault in ld-musl-x86_64.so.1 before completing path grouping. No dm device is ever created.
support-bundle.zip
Environment
Issue 1: multipathd segfaults in ld-musl-x86_64.so.1 when FC paths are present
After approximately 60-90 seconds of runtime, multipathd crashes with a segfault consistently reproducible across all three nodes. The crash occurs in the dynamic linker of musl libc regardless of the path_checker used (tested with both
directioandtur).Kernel log output (same pattern on all nodes):
The service restarts automatically and the crash repeats indefinitely. During the brief window before the crash, multipathd is visible and running, but all four SCSI paths remain in
orphanstate and no dm device is ever created.multipathd show paths output (captured before crash):
The kernel detects all four paths correctly at boot (two per HBA port, two HBA ports per node):
udev correctly populates all relevant attributes on each device:
Issue 2: ExtensionServiceConfig creates configFile with 0 bytes
When providing a custom multipath.conf via ExtensionServiceConfig, the file is created in the container overlay at the correct path but its content is never written. The file is consistently 0 bytes.
Observed filesystem state inside the overlay:
The ExtensionServiceConfig spec is correctly stored in Talos (verified via
talosctl get extensionserviceconfigs -o yaml) and contains the full configuration, but it does not reach the file on disk.Workaround applied: a privileged DaemonSet with an initContainer writes the configuration directly to the overlay path
/system/overlays/usr-local-lib-containers-multipathd-diff/etc/multipath/multipath.conf. After this workaround,multipathd show config localconfirms the custom configuration is being read. However, the segfault (Issue 1) persists regardless of the configuration provided.Issue 3: ExtensionServiceConfig name mismatch documentation
The correct name for the ExtensionServiceConfig must be
multipathd, notmultipath. Usingname: multipathcauses Talos to store the configuration without error but the extension service never picks it up. This took significant time to diagnose and is not documented anywhere. A note in the extension README would help future users.Configuration used
ExtensionServiceConfig:
machine.udev rules (added to ensure udev notifies multipathd of block devices):
machine.kernel.modules:
Additional findings
The extension container runs as an overlay at
/usr/local/lib/containers/multipathd/with its writable layer at/system/overlays/usr-local-lib-containers-multipathd-diff/. The mountPath in configFiles appears to place files under/etc/multipath/(subdirectory) rather than/etc/(root). The correct mountPath to reach the location multipathd reads is/etc/multipath/multipath.conf, not/etc/multipath.conf.The extension was also tested on Talos v1.13.3 where the segfault was more aggressive (occurring immediately on startup, causing a rapid restart loop). On v1.13.4 the service runs for approximately 60-90 seconds before crashing, suggesting a partial improvement but the underlying issue remains.
Expected behavior
multipathd should run stably, group the four FC paths into a single dm device, and expose it at
/dev/disk/by-id/wwn-0x624a93707e521182588644d300011b2dpointing to a dm device rather than a raw SCSI disk.Actual behavior
multipathd crashes repeatedly with a segfault in ld-musl-x86_64.so.1 before completing path grouping. No dm device is ever created.