Skip to content

ZoL: fh_to_dentry export op blocks access to open-unlinked files #18699

Description

@dkobras

System information

Type Version/Name
Distribution Name Debian
Distribution Version 13 (trixie)
Kernel Version 6.12.94+deb13-amd64
Architecture x86_64
OpenZFS Version 2.3.2-2

Describe the problem you're observing

When the Linux fh_to_dentry() export operation is called for an open-unlinked file on a ZFS filesystem, an ESTALE error is returned. This behavior differs from other filesystem implementations on Linux (like eg. ext4, XFS), which grant access to inodes that are unlinked but still referenced. This deviation can lead to unexpected behavior from callers open_by_handle_at(), or during NFS exports of ZFS filesystems (like #11163 or #6197).

This is due to the following check in zfs_vget():

	if (zp->z_unlinked || zp_gen != fid_gen) {
		dprintf("znode gen (%llu) != fid gen (%llu)\n", zp_gen,
		    fid_gen);
		zrele(zp);
		zfs_exit(zfsvfs, FTAG);
		return (SET_ERROR(ENOENT));
	}

which has been around since the initial commit in git. It follows a call to zfs_zget() where some related checks have been added and refined over the years. In the current code, once we hit the cited check in zfs_vget(), we can be sure that we hold a reference on the inode, and iput_final() has not yet been invoked. It's not clear to me what the additional check for z_unlinked in zfs_vget() is trying to protect against, and why it shouldn't just hand out the znode instead. Initial tests with a trimmed condition if (zp_gen != fid_gen) seem to work fine, and bring ZFS fh_to_dentry behavior in line with other Linux filesystems.

Similar checks exist in the FreeBSD specifc code, but I'm not sure about the expected behavior on this platform.

Describe how to reproduce the problem

The following reproducer illustrates the difference. It needs to be started with root privileges to create and unlink a file local_testfile.txt in the current working directory. Before the file is unlinked, at handle is obtained, and later used to open the already unlinked file. On (at least) ext4, XFS, btrfs, and tmpfs, the test succeeds. On ZFS, it fails with ESTALE.

#define _GNU_SOURCE
#include <stdio.h>
#include <stdlib.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <errno.h>
#include <sys/stat.h>

#ifndef MAX_HANDLE_SZ
#define MAX_HANDLE_SZ 128
#define AT_FDCWD -100
#endif

int main() {
    const char *filename = "local_testfile.txt";
    int mount_fd, file_fd, handle_fd;
    struct file_handle *fhp = NULL;
    int mnt_id;
    int ret = 2;

    fhp = malloc(sizeof(struct file_handle) + MAX_HANDLE_SZ);
    if (!fhp) {
        perror("[-] malloc failed");
	goto out_fhp;
    }
    fhp->handle_bytes = MAX_HANDLE_SZ;

    mount_fd = open(".", O_RDONLY | O_DIRECTORY);
    if (mount_fd < 0) {
        perror("[-] open (mount_fd) failed");
	goto out_fhp;
    }

    file_fd = open(filename, O_CREAT | O_RDWR | O_TRUNC, 0644);
    if (file_fd < 0) {
        perror("[-] create test file failed");
	goto out_mount;
    }
    printf("[*] file '%s' created and opened (fd=%d).\n", filename, file_fd);

    if (name_to_handle_at(AT_FDCWD, filename, fhp, &mnt_id, 0) < 0) {
        perror("[-] name_to_handle_at failed");
	goto out_file;
    }
    printf("[*] successfully obtained filehandle from file system.\n");

    if (unlink(filename) < 0) {
        perror("[-] unlink failed");
	goto out_file;
    }
    printf("[*] file unlinked (dentry deleted, fd reference %d still active).\n", file_fd);

    printf("[*] calling open_by_handle_at()...\n");
    handle_fd = open_by_handle_at(mount_fd, fhp, O_RDONLY);
    
    if (handle_fd < 0) {
        printf("[!] FAIL: open_by_handle_at() failed: errno %d (%s)\n", 
               errno, strerror(errno));
	ret = 1;
    } else {
        printf("[+] OK: open_by_handle_at() succeeded! (new_fd=%d)\n", handle_fd);
        close(handle_fd);
	ret = 0;
    }

out_file:
    close(file_fd);
out_mount:
    close(mount_fd);
out_fhp:
    free(fhp);
    return ret;
}

Include any warning/errors/backtraces from the system logs

The following bfstrace script can be used to track fh_to_dentry calls for inodes on ZFS, and their corresponding z_unlinked state.

#!/usr/bin/env bpftrace

#include <linux/fs.h>

/* Mark FH lookup entry */
kprobe:zfs:zpl_fh_to_dentry
{
    @in_fh[tid] = 1;
}

/* Store zpp on entry into zfs_zget */
kprobe:zfs:zfs_zget
/ @in_fh[tid] /
{
    @zpp_store[tid] = arg2;
}

/* Obtain Znode pointer on success */
kretprobe:zfs:zfs_zget
/ @in_fh[tid] && @zpp_store[tid] /
{
    $zpp = @zpp_store[tid];
    if (retval == 0 && $zpp != 0) {
        @active_zp[tid] = *(int64 *)$zpp;
    }
}

/* Show inode number and unlinked state */
kprobe:zfs:sa_lookup
/ @in_fh[tid] && @active_zp[tid] /
{
    $zp = @active_zp[tid];
    
    // Zugriff auf die eingebettete Linux-Inode innerhalb der OpenZFS-Struktur
    $inode_num = ((struct znode *)$zp)->z_inode.i_ino;
    $z_unlinked = ((struct znode *)$zp)->z_unlinked;

    printf("[NFS-ZFS-FH-TRACKER] Inode %lu, z_unlinked %d\n", $inode_num, $z_unlinked);
}

/* Cleanup temporary maps on exit */
kretprobe:zfs:zpl_fh_to_dentry
{
    if (@in_fh[tid])       { delete(@in_fh[tid]); }
    if (@zpp_store[tid])   { delete(@zpp_store[tid]); }
    if (@active_zp[tid])   { delete(@active_zp[tid]); }
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: DefectIncorrect behavior (e.g. crash, hang)

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions