Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions changelogs/fragments/11020-zpool-device-path-idempotency.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
bugfixes:
- zpool - idempotency failed when canonical device IDs were used; the fix now ensures consistent device path normalization (https://github.com/ansible-collections/community.general/issues/10771, https://github.com/ansible-collections/community.general/issues/10744, https://github.com/ansible-collections/community.general/pull/11020).
13 changes: 9 additions & 4 deletions plugins/modules/zpool.py
Original file line number Diff line number Diff line change
Expand Up @@ -310,11 +310,16 @@ def base_device(self, device):
if match:
return match.group(1)

# disk/by-id drives
match = re.match(r'^(/dev/disk/by-id/(.*))-part\d+$', device)
if match:
return match.group(1)

return device

def get_current_layout(self):
with self.zpool_runner('subcommand full_paths real_paths name', check_rc=True) as ctx:
rc, stdout, stderr = ctx.run(subcommand='status', full_paths=True, real_paths=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering whether this change (and the one further below) could break some uses of the module unrelated to /dev/disk/by-id/, and would require another adjustment to avoid this breakage.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been hoping to take some time to look at this a bit more, but I just haven't been able to find any lately, so my comments are based solely on browsing the changes and not testing, but my instincts are that:

  1. There's the potential for /dev/disk/by-*/* to be used for identifying disks, so ideally we'd cover more than just the by-id path.
  2. Everything under /dev/disk/by-*/* is really just a symlink, so I would have initially though that just doing an os.readlink on any devices passed as symlinks might solve the problem. I'm not sure if this truely vibes with how zfs stores device names though.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Second that. A fix for the underlying problem should aim to support all kinds of /dev/disk/by-* symlinks.

Copy link

@gumbo2k gumbo2k Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@felixfontein , I thought about that, but I do not think it will cause problems. Not here where we only read the current layout, and not further down.
Rather the opposite.
In both places we call zpool status and extract the devices that make up the pool.

As an admin, when creating a zfs pool on a server with lots of disks, I expect disks to fail over time, to disappear or be replaced, and so on. So when I create the pool, I make sure to use device links that will survive those changes.

Enforcing real_paths in get_current_layout doesn't make sense. I want the same layout reported back, that I put in.
If I used the stable by-id symlinks during creation of the pool, I want them to be reported back.
If I used /dev/sdx, /dev/sdy, /dev/sdz during creation of the pool, I want those to be reported back.

I checked the history of the module, to see if the translation to "real_paths" was added in a commit that would explain the rational, but it seems real_paths=True was in there from the start.

Copy link

@gumbo2k gumbo2k Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've been hoping to take some time to look at this a bit more, but I just haven't been able to find any lately, so my comments are based solely on browsing the changes and not testing, but my instincts are that:

There's the potential for /dev/disk/by-*/* to be used for identifying disks, so ideally we'd cover more than just the by-id path.

@dthomson-triumf , @n3ph , I briefly considered extending the regex match to include all /dev/disk/by-*/ directoriies but I intentionally limited the blast radius to avoid unintentional changes in behavior, just in case somebody creates a symlink named something-part123 that is not a link to a partition.

I know more-or-less what to expect in by-id, but directories like by-diskseq, by-dname, by-loop-inode, by-loop-ref, by-partuuid, by-path or by-uuid ? Some are only there for loopback devices and some only get populated by creation of filesystems or partitions. Exactly what base_device() tries to avoid.

Everything under /dev/disk/by-*/* is really just a symlink, so I would have initially though that just doing an os.readlink on any devices passed as symlinks might solve the problem. I'm not sure if this truely vibes with how zfs stores device names though.

I don't have much experience with zfs, but the module does not resolve symlinks when creating a pool, and zpools seems to work nicely with those symlinked devices.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gumbo2k IMO the risk is fairly low considering that these device paths are values that would have been entered by a person (or a lookup module or something, but I think the point remains).

However, I think the best solution is what you mentioned in your previous reply to @felixfontein. I was kind of fuzzy on why the device names were different in the zpool from the values that were entered. I didn't realize that it was the zpool module that was trying to change the device name via the real_paths option. Generally, I don't think the Ansible module should be the responsible for what my device names are according to ZFS. That should be ZFS's responsibility.

Copy link

@n3ph n3ph Nov 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think, most of these devices are managed by udev and device mapper anyway.

In general, I would leave it up to the user to make sense out of what is going to be configured. Underlying LUN specifics are nothing for an anisble module to assume, but for the user to consider.

Edit: Though, thank you for looking into this issue.. 🙇🏼

with self.zpool_runner('subcommand full_paths name', check_rc=True) as ctx:
rc, stdout, stderr = ctx.run(subcommand='status', full_paths=True)

vdevs = []
current = None
Expand Down Expand Up @@ -433,8 +438,8 @@ def add_vdevs(self):
return {'prepared': stdout}

def list_vdevs_with_names(self):
with self.zpool_runner('subcommand full_paths real_paths name', check_rc=True) as ctx:
rc, stdout, stderr = ctx.run(subcommand='status', full_paths=True, real_paths=True)
with self.zpool_runner('subcommand full_paths name', check_rc=True) as ctx:
rc, stdout, stderr = ctx.run(subcommand='status', full_paths=True)
in_cfg = False
saw_pool = False
vdevs = []
Expand Down