Skip to content

Inconsistent behavior and eventual BSOD after playing with snapshots, clones and datasets #549

@Elfy

Description

@Elfy

System information

Type Version/Name
Distribution Name Windows 10
Distribution Version 10.0.19041.1
Architecture x64
OpenZFS Version zfswin-2.3.1rc14

Describe the problem you're observing

Some datasets' mounts become inaccessible after doing a lot of creation/destruction of datasets.
In attempt to fix it I tried unmounting/remounting them, which worked for some but others got me error "not currently mounted", despite mount point existing. Trying to mount such dataset gave error "Unknown error". Some of successfully remounted datasets became inaccessible again at some point. Finally I tried to fix unmounted but impossible to mount dataset with rename. Rename was successful, but then, alas, my memory gets foggy, and BSOD ate some of the latest commands. The gist of it is that mounting created yet another inaccessible mountpoint with new name (old name is still present btw, despite associated dataset being renamed and apparently nothing being mounted under it!) and then either on trying to unmount this new mountpoint or remount it system crashed.

Describe how to reproduce the problem

I can't give perfect recreation of everything done, but roughly I created a pool (single hdd, entirely free of partitions), copied a few folders, made snapshot of the result, made 2 clones of that snapshot, modified a few bytes in a file in both of the clones. Created 2 more datasets that I just copied some test data in and didn't do anything weird with them (one of said datasets became eventually inaccessible, but could be remounted successfully). Then I went in a loop for a few hours: create new dataset with different recordsize and compression, copy test data, get stats, destroy dataset. After that I noticed inaccessibility of some datasets. Reboot didn't help with all of them. And then rename -> dismount/mount led to bsod. Right now the only inaccessible is second clone, with two mountpoints shown, one for old name and one for new.

Include any warning/errors/backtraces from the system logs

info.txt
stack.txt
cbuf.txt

I'm going to leave this pool as is for a bit, in case something useful for debugging can be extracted out of it.

Somewhat unrelated: the driver does seem to leak, the memory usage quickly stabilized after a I started doing create/copy/destroy loop, but then ever so slowly kept inching up, reaching almost 2 extra GB after ~ 2h. Also, default limits seem unreasonable on windows, with default config the system starts lagging before driver starts giving back memory (if it ever does). The above was done with zfs_arc_max=4GB, with memory usage initially stabilizing at 8GB, then reaching 10GB/32GB.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions