pyzdb — a Python+C toolkit for exploring ZFS on-disk structures from raw block devices #18340

zhengyp36 · 2026-03-17T07:01:33Z

zhengyp36
Mar 17, 2026

Hi OpenZFS community,

I'd like to share a project I've been working on: pyzdb, a toolkit for reading and parsing ZFS on-disk structures directly from raw block devices — without importing the pool, and without wrapping zdb.

What it does

Starting from the pool label, pyzdb follows the full on-disk chain:

label → nvlist → vdev tree → uberblock → root blkptr
      → meta-objset → dnode → blkptr → file data

All of this happens in userspace, reading only from /dev/sdX.

Here is a real session. The pool is not imported — zpool list shows nothing. pyzdb reads the file directly from disk:

$ zpool list
no pools available

$ python
>>> from zdb import *
>>> mgr = SpaManager()
>>> mgr.ls()
poolx[GUID=0xf0a7ef4b5382af55]
  <raidz>[0]
    <disk>[0]/dev/sdb1
    <disk>[1]/dev/sdc1
    <disk>[2]/dev/sdd1
  <raidz>[1]
    <disk>[0]/dev/sde1
    <disk>[1]/dev/sdf1
    <disk>[2]/dev/sdg1

>>> fs1 = mgr.open_ds('poolx/fs1')
>>> f = fs1.rootdir.get('test.txt')
>>> data = fs1.os.spa.reader.read(f.dnphys.dn_blkptr[0])
>>> print(data)
hello, world

What it supports

NVList parsing from vdev labels
Full vdev tree reconstruction (including multi-top-vdev pools)
Uberblock traversal and selection by txg
Block pointer (blkptr) parsing and DVA resolution
RAIDZ layout calculation and data reconstruction
Decompression (LZ4, ZSTD, and others)
DMU object model: objset, dnode, zap, metaslab, spacemap
Directory traversal and file data retrieval

Architecture

pyzdb is structured in four layers:

ZFS Object Layer   (dnode, objset, zap, metaslab)
Block Layer        (blkptr, dva, checksum, decompress)
Disk Layer         (vdev, raidz, mmap I/O)
Physical Disk

The core design uses a general-purpose structure parser called CStruct: each ZFS on-disk structure only needs to declare its field names, offsets, and types — parsing is handled automatically. This makes it straightforward to add new structures without writing repetitive parsing code.

Python handles object modeling and interactive exploration. C handles disk I/O (mmap), checksum, decompression, and RAIDZ math — reusing algorithms from the OpenZFS codebase where appropriate.

Why I built it

ZFS internals are well-documented in zfs-on-disk and in the source, but it is hard to build a complete mental model from those alone.

The core problem: I could read what the structures should look like, but I had no way to look at what was actually on disk and verify my understanding against real data.

pyzdb bridges that gap. Its design principle is:

Use on-disk data to validate the ZFS model — not the other way around.

It is closer to a "disk-level debugger" than a management tool — useful for verifying format assumptions, analyzing space allocation behavior, and inspecting metadata in damaged pool scenarios.

Background

I have worked on Linux kernel and ZFS engineering professionally. pyzdb was built during an independent research period after that work, as a way to make the entire ZFS on-disk object model directly observable and testable.

Current limitations

This is a research-grade tool built primarily to understand and verify the ZFS on-disk format. Known limitations:

Checksum verification is not yet implemented. The tool reads and traverses structures but does not validate block checksums.
RAIDZ support is partial. Basic reads on a raidz1 pool (matching the demo setup) have been verified. More complex RAIDZ configurations may fail.
Some newer ZFS feature flags are not supported.

License

pyzdb is MIT licensed. Some C modules are adapted from OpenZFS (CDDL) and retain their original license headers. Some header file definitions derived from OpenZFS are noted in file comments.

Repository

https://github.com/zhengyp36/pyzdb

Detailed documentation (including a step-by-step walkthrough and internal architecture notes) is available in the doc/ directory.

Any feedback is welcome — especially if something in the on-disk format handling looks incorrect, or if there are scenarios you think would be worth supporting.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pyzdb — a Python+C toolkit for exploring ZFS on-disk structures from raw block devices #18340

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

pyzdb — a Python+C toolkit for exploring ZFS on-disk structures from raw block devices #18340

Uh oh!

zhengyp36 Mar 17, 2026

Replies: 0 comments

zhengyp36
Mar 17, 2026