Skip to content

libzfs_mnttab_cache: B_FALSE silently a no-op since 0ecf5e3f6, breaking consumers that depend on cache-disabled behavior #18464

@prakashsurya

Description

@prakashsurya

System information

Type Version/Name
Distribution Name Ubuntu (Delphix Engine appliance)
Distribution Version 24.04 LTS base
Kernel Version 6.17.0 (x86_64)
Architecture x86_64
OpenZFS Version master at or after 0ecf5e3f6; verified A/B against master at 545d66204d (pre-0ecf5e3f6) and 4655bdd8ab (post-0ecf5e3f6)

The bug is independent of distribution and kernel — it is in lib/libzfs/libzfs_mnttab.c and reproduces against any libzfs built from a tree that contains commit 0ecf5e3f6.

This regression is currently on master only. No released or staging branch (zfs-2.2.x, zfs-2.3.x, zfs-2.4.x) contains 0ecf5e3f6 yet, so reverting before the next release branch is cut keeps downstream impact at zero.

Describe the problem you're observing

After our latest merge from upstream master, we started seeing ZFS_PROP_MOUNTED return 0 for filesystems that are plainly mounted.. which trips a "filesystems not mounted" check in our snapshot path and aborts the workflow.

After some investigation, I believe the cause is 0ecf5e3f6 libzfs/mnttab: always enable the cache (PR #18296), which silently turned libzfs_mnttab_cache(hdl, B_FALSE) into a no-op. The function is still part of the public libzfs.h API:

_LIBZFS_H void libzfs_mnttab_cache(libzfs_handle_t *, boolean_t);

But after 0ecf5e3f6 its body is:

void
libzfs_mnttab_cache(libzfs_handle_t *hdl, boolean_t enable)
{
    /* This is a no-op to preserve ABI backward compatibility. */
    (void) hdl, (void) enable;
}

The ABI is preserved, but the behavior the function used to provide — disabling the per-handle mnttab cache so that libzfs_mnttab_find consults /etc/mtab directly on every call — isn't there anymore. For consumers that hold more than one libzfs_handle_t in a process and rely on the cache being disabled for cross-handle correctness, this leads to wrong answers from ZFS_PROP_MOUNTED for filesystems that are mounted.

Looking at the commit, two pieces of the cache-disabled path got removed together:

  1. The cache-disabled fast path in libzfs_mnttab_find that did fopen(MNTTAB) + getmntany() per call (and defensively libzfs_mnttab_fini'd any stray AVL state).
  2. The if (avl_numnodes != 0) guard in libzfs_mnttab_add that kept the AVL empty when the cache was disabled.

Without those two pieces, the AVL is unconditionally populated by libzfs_mnttab_add. Once the AVL has any entries, libzfs_mnttab_find skips the /etc/mtab re-read (it only re-reads when the AVL is empty). So a handle that has done one zfs_mount will then return ENOENT from libzfs_mnttab_find for any other dataset that another handle (or out-of-process actor) has mounted — even though that dataset is plainly in /etc/mtab.

For us this shows up as ZFS_PROP_MOUNTED returning 0 for filesystems that are mounted, which aborts our snapshot workflow with a fatal "filesystems not mounted" check failure.

Describe how to reproduce the problem

Single-file C program below using only public libzfs.h API. No real mounts and no root permissions are needed; just supply the name of any currently-mounted ZFS dataset.

/*
 * libzfs_mnttab_cache_repro.c
 *
 * Demonstrates that libzfs_mnttab_cache(hdl, B_FALSE) no longer disables
 * the per-handle mnttab cache after openzfs/zfs commit 0ecf5e3f6, and
 * that zfs_prop_get_int(ZFS_PROP_MOUNTED) consequently returns 0 for
 * filesystems that are mounted.
 *
 * Build:
 *   cc -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -o repro \
 *      libzfs_mnttab_cache_repro.c $(pkg-config --cflags --libs libzfs)
 *
 * (The two -D flags are needed because libspl's sys/stat.h references
 * struct stat64; libzfs.pc.in does not currently set them itself.)
 *
 * Run (as a user that can read /etc/mtab):
 *   ./repro <name-of-any-currently-mounted-zfs-dataset>
 */

#include <errno.h>
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <sys/mnttab.h>
#include <libzfs.h>

int
main(int argc, char *argv[])
{
    if (argc != 2) {
        fprintf(stderr,
            "usage: %s <currently-mounted-zfs-dataset>\n", argv[0]);
        return (2);
    }
    const char *real_ds = argv[1];

    libzfs_handle_t *hdl = libzfs_init();
    if (hdl == NULL) {
        fprintf(stderr, "libzfs_init failed\n");
        return (1);
    }

    /* Ask libzfs to disable the per-handle mnttab cache. */
    libzfs_mnttab_cache(hdl, B_FALSE);

    /*
     * Stand-in for what zfs_mount() does internally on every successful
     * mount: lib/libzfs/libzfs_mount.c:582 calls libzfs_mnttab_add(hdl, ...)
     * after do_mount(). In a real consumer, this happens implicitly; we
     * call it directly here so the reproducer doesn't need root or a
     * mountable dataset.
     *
     * Pre-0ecf5e3f6: this was a no-op because the AVL was empty and the
     * old code guarded the insert with `if (avl_numnodes != 0)`.
     * Post-0ecf5e3f6: the guard is gone and this unconditionally
     * populates the AVL.
     */
    libzfs_mnttab_add(hdl, "fake/dataset", "/fake/mountpoint", "rw");

    /*
     * Now query ZFS_PROP_MOUNTED on a real, currently-mounted dataset.
     * This is the standard libzfs API a consumer uses to check mount
     * state. Internally it calls libzfs_mnttab_find().
     *
     * Pre-0ecf5e3f6: cache disabled, find() fopens /etc/mtab, returns 1.
     * Post-0ecf5e3f6: AVL has the fake entry from above (non-empty), so
     * find() skips the /etc/mtab refresh, doesn't see real_ds in its
     * AVL, returns ENOENT internally, and ZFS_PROP_MOUNTED evaluates to 0.
     */
    zfs_handle_t *zhp = zfs_open(hdl, real_ds, ZFS_TYPE_FILESYSTEM);
    if (zhp == NULL) {
        fprintf(stderr, "zfs_open(%s) failed\n", real_ds);
        libzfs_fini(hdl);
        return (1);
    }

    uint64_t mounted = zfs_prop_get_int(zhp, ZFS_PROP_MOUNTED);
    zfs_close(zhp);

    int rc;
    if (mounted) {
        printf("OK: ZFS_PROP_MOUNTED reports %s as mounted\n", real_ds);
        rc = 0;
    } else {
        printf("BUG: ZFS_PROP_MOUNTED reports %s as NOT mounted\n",
            real_ds);
        printf("     but %s IS mounted (see /etc/mtab and `zfs get mounted`).\n",
            real_ds);
        printf("     libzfs_mnttab_cache(hdl, B_FALSE) did not actually disable the cache.\n");
        rc = 1;
    }

    libzfs_fini(hdl);
    return (rc);
}

Expected output before 0ecf5e3f6:

OK: ZFS_PROP_MOUNTED reports <dataset> as mounted

Expected output after 0ecf5e3f6:

BUG: ZFS_PROP_MOUNTED reports <dataset> as NOT mounted
     but <dataset> IS mounted (see /etc/mtab and `zfs get mounted`).
     libzfs_mnttab_cache(hdl, B_FALSE) did not actually disable the cache.

Why this matters in a multi-handle consumer

The reproducer above is single-handle for minimality, and it directly calls libzfs_mnttab_add to stand in for what zfs_mount() does internally. In our actual usage we don't call libzfs_mnttab_add directly — we hold multiple libzfs_handle_ts and use them across threads, and zfs_mount() does the add for us under the hood. The same contract violation falls out naturally:

  1. Handle A calls zfs_mount(zhp_x, ...) to mount dataset X. After do_mount() succeeds, zfs_mount_at calls libzfs_mnttab_add(hdl_a, ...) at lib/libzfs/libzfs_mount.c:582. Handle A's AVL now has X.
  2. Handle B calls zfs_mount(zhp_y, ...) to mount dataset Y. Same path; handle B's AVL has Y.
  3. Some time later, handle A queries zfs_prop_get_int(zhp_y, ZFS_PROP_MOUNTED) for dataset Y.
    • Pre-0ecf5e3f6: A's cache is disabled, so libzfs_mnttab_find consults /etc/mtab directly, finds Y, returns mounted = 1.
    • Post-0ecf5e3f6: A's AVL has X (non-empty), so libzfs_mnttab_find skips the /etc/mtab refresh, doesn't see Y in its AVL, returns ENOENT, and ZFS_PROP_MOUNTED evaluates to 0 — even though Y is plainly mounted.

That's the failure mode we're hitting. Mounts via one handle silently invalidate ZFS_PROP_MOUNTED queries on every other handle in the process.. no warning, no logged error, just wrong answers.

Include any warning/errors/backtraces from the system logs

Verified A/B on two otherwise-identical systems (same OS, same compiler, same reproducer build) differing only in the version of openzfs/zfs master they were built from:

Pre-0ecf5e3f6 (built from master at upstream 545d66204d, 2025-09-17 — before 0ecf5e3f6):

$ ./repro rpool/ROOT/<bootenv>/root
OK: ZFS_PROP_MOUNTED reports rpool/ROOT/<bootenv>/root as mounted
$ echo $?
0

Post-0ecf5e3f6 (built from master at upstream 4655bdd8ab, 2026-03-17 — after 0ecf5e3f6):

$ ./repro rpool/ROOT/<bootenv>/root
BUG: ZFS_PROP_MOUNTED reports rpool/ROOT/<bootenv>/root as NOT mounted
     but rpool/ROOT/<bootenv>/root IS mounted (see /etc/mtab and `zfs get mounted`).
     libzfs_mnttab_cache(hdl, B_FALSE) did not actually disable the cache.
$ echo $?
1

Three independent signals (/etc/mtab, zfs get mounted, zfs_prop_get_int(ZFS_PROP_MOUNTED)) all agree that the dataset is mounted on both systems; only the post-0ecf5e3f6 ZFS_PROP_MOUNTED disagrees, and only because the per-handle cache the consumer asked to disable is actually still on.

In the originally-affected production code path, this ZFS_PROP_MOUNTED == 0 trips a "filesystems are not mounted" assertion that aborts the workflow.

Background — how we use libzfs

We're a long-running, multi-threaded userland (Delphix Engine) that links libzfs directly. Nothing exotic at the libzfs boundary:

  1. We call libzfs_init() to allocate a libzfs_handle_t. Handles are pooled; the pool grows lazily with concurrent demand and typically reaches 5–20 handles on a busy process. Handles are long-lived and reused.
  2. Immediately after each libzfs_init() we call libzfs_mnttab_cache(hdl, B_FALSE).
  3. Each handle is then used for ordinary operations: zfs_open, zfs_mount, zfs_prop_get_int(ZFS_PROP_MOUNTED), etc.

We disable the cache for exactly the case the cache can't handle correctly: mounts performed via one handle invisibly populate that handle's AVL, but other handles' AVLs don't see them. With the cache disabled, libzfs_mnttab_find is supposed to consult /etc/mtab — the actual source of truth, shared across handles and processes — on every call. That's the invariant that makes multi-handle usage correct.

It's the very first thing our getHandle() does after libzfs_init(), and has been since 2011, when we moved from forking the zfs(8) CLI to linking libzfs directly.

The simplest fix from our perspective would be to restore the prior cache-disabled mode so consumers can still opt out via libzfs_mnttab_cache(hdl, B_FALSE) — i.e., libzfs_mnttab_find consulting /etc/mtab directly when the cache is disabled, and libzfs_mnttab_add not populating the AVL in that mode. The field renames in 0ecf5e3f6 (libzfs_mnttab_cache AVL → zh_mnttab, libzfs_mnttab_updatemnttab_update, etc.) don't really matter; an implementer can keep the new names.

The commit message for 0ecf5e3f6 says "the zfs command always enables it anyway, and right now there's multiple places that do mount work that don't go through the cache anyway".. which is true for the CLI but doesn't really cover library consumers that hold multiple handles and explicitly opt out. We've been doing this since 2011, so I'd be surprised if we're the only ones — anything that links libzfs from a long-lived multi-threaded process and uses multiple handles concurrently would hit the same thing. Other consumers may simply not have picked up the change yet.

Happy to send a patch if it'd help.

References

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions