Skip to content

zvol with directio#18661

Open
tiehexue wants to merge 4 commits into
openzfs:masterfrom
tiehexue:zvol-directio
Open

zvol with directio#18661
tiehexue wants to merge 4 commits into
openzfs:masterfrom
tiehexue:zvol-directio

Conversation

@tiehexue

Copy link
Copy Markdown
Contributor

Motivation and Context

DirectIO for ZVOL, refer to #18644 .

Description

Just following how zfs filesystem handle the DriectIO, still draft version, with tests passed locally (linux only).

How Has This Been Tested?

Local functional tests and coming CI.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Quality assurance (non-breaking change which makes the code more robust against bugs)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@tiehexue tiehexue marked this pull request as draft June 10, 2026 15:18
@github-actions github-actions Bot added the Status: Work in Progress Not yet ready for general review label Jun 10, 2026

@amotin amotin left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've had only a very brief look, but I am not sure how much good it will make on FreeBSD without first implementing unmapped I/O support (G_PF_ACCEPT_UNMAPPED).

@amotin

amotin commented Jun 10, 2026

Copy link
Copy Markdown
Member

One more comment is that on Linux some people might use O_DIRECT as a way to bypass the page cache. Making it also bypass ARC may cause too big performance degradation if workload is not really suitable for real Direct I/O. Unfortunately I don't know other way to disable the page cache there.

@tiehexue

Copy link
Copy Markdown
Contributor Author

I've had only a very brief look, but I am not sure how much good it will make on FreeBSD without first implementing unmapped I/O support (G_PF_ACCEPT_UNMAPPED).

Thanks, I will come to that.

@tiehexue

Copy link
Copy Markdown
Contributor Author

One more comment is that on Linux some people might use O_DIRECT as a way to bypass the page cache. Making it also bypass ARC may cause too big performance degradation if workload is not really suitable for real Direct I/O. Unfortunately I don't know other way to disable the page cache there.

Oh. But I guess, linux guy who is not zfs guy should mean Direct I/O if using O_DIRECT, and his workload should be good in Direct I/O, so we give him Direct I/O. And zfs guy should know page cache only exists in mmap thing, and not necessary using O_DIRECT, if used, we also try Direct I/O.

The PR also has preflight check for if Direct I/O possible, e.g. page alignment.

@amotin

amotin commented Jun 10, 2026

Copy link
Copy Markdown
Member

And zfs guy should know page cache only exists in mmap thing

For files on file systems -- may be true, but for block devices from user space -- it is not. Don't ask me why Linux do it for block devices, because FreeBSD, where I came from, does not.

@tiehexue tiehexue force-pushed the zvol-directio branch 4 times, most recently from 6f88807 to 7923d1a Compare June 11, 2026 07:42
@tiehexue

Copy link
Copy Markdown
Contributor Author

And zfs guy should know page cache only exists in mmap thing

For files on file systems -- may be true, but for block devices from user space -- it is not. Don't ask me why Linux do it for block devices, because FreeBSD, where I came from, does not.

Ok, below is how I think, and this PR did not touch O_DIRECT flag.

Scenario Page Cache ARC Disk
Plain read(), zvol_dio_enabled=0 ✅ active ✅ active on miss
O_DIRECT, zvol_dio_enabled=0 ❌ bypassed ✅ active on ARC miss
O_DIRECT, zvol_dio_enabled=1 ❌ bypassed ❌ bypassed always
Plain read(), zvol_dio_enabled=1 ✅ active ❌ bypassed on page cache miss

@tiehexue tiehexue force-pushed the zvol-directio branch 2 times, most recently from 680eb8b to 8abf5e5 Compare June 11, 2026 14:37
@amotin

amotin commented Jun 11, 2026

Copy link
Copy Markdown
Member

this PR did not touch O_DIRECT flag

I don't know if O_DIRECT even passed through the block layers, but system-wide knob is not the nicest way to control it. Even dataset property we have for file systems (and that should probably be ported) is quite coarse.

@tiehexue

tiehexue commented Jun 11, 2026

Copy link
Copy Markdown
Contributor Author

this PR did not touch O_DIRECT flag

I don't know if O_DIRECT even passed through the block layers, but system-wide knob is not the nicest way to control it. Even dataset property we have for file systems (and that should probably be ported) is quite coarse.

Oh, I am planning to add property to ZVOL. Anyway, we are not able to add another O_DIRECT like flag in kernel API? More ideas?

Signed-off-by: tiehexue <tiehexue@hotmail.com>
@tiehexue

Copy link
Copy Markdown
Contributor Author

@amotin hi, you may want to take a look this PR, though in draft. Start with test cases, see if valuable. Now it can pass existing zvol tests, and newly added ones and other test cases without regressions. Both in linux and freebsd works. Actually, I am a little confusing if it really works, because, not so much code. Despite abd_alloc_from_pages in zvol_os.c in freebsd, other code are straight. And I do find ARC data size is not increased if DIO enabled in my local dev machine.

There is one test case can not be passed in FreeBSD, and I just remove it. I see it as an application issue not a zvol/directio one. Suppose an application is keeping writing to a file/zvol with fsync, with ARC, zfs/zvol just ensure all data will be in disk, but not in time, they may be in ZIL. Now, if there is another application is keeping reading the file/zvol without ARC, with DirectIO, what should we guarantee?

We can not avoid stale read, and we shouldn't. If applications is using file/zvol as "communication" channel, they should do locks in application level. Or, file/zvol should not be used in such scenario.

@amotin

amotin commented Jun 12, 2026

Copy link
Copy Markdown
Member

Now, if there is another application is keeping reading the file/zvol without ARC, with DirectIO, what should we guarantee?

IIRC for regular files Direct I/O provides full coherency. Direct I/O write should get to the disk, and its block pointer should be stored in both dbuf and ZIL. Following reads should read from disk by the block pointer stored in respective dbuf until the TXG is committed. If system crash before the commit, ZIL will replay that write, reading by the stored block pointer. If not, then after TXG commit the written block will be in the indirect blocks, etc.

Include DIO write -> ARC read, etc.

Signed-off-by: tiehexue <tiehexue@hotmail.com>
@tiehexue

Copy link
Copy Markdown
Contributor Author

Now, if there is another application is keeping reading the file/zvol without ARC, with DirectIO, what should we guarantee?

IIRC for regular files Direct I/O provides full coherency. Direct I/O write should get to the disk, and its block pointer should be stored in both dbuf and ZIL. Following reads should read from disk by the block pointer stored in respective dbuf until the TXG is committed. If system crash before the commit, ZIL will replay that write, reading by the stored block pointer. If not, then after TXG commit the written block will be in the indirect blocks, etc.

I added DIO write -> ARC read test case if for coherency test if I understand correctly, it passes in my local dev.

@tiehexue tiehexue marked this pull request as ready for review June 13, 2026 14:02
@github-actions github-actions Bot added Status: Code Review Needed Ready for review and testing and removed Status: Work in Progress Not yet ready for general review labels Jun 13, 2026
@tiehexue

tiehexue commented Jun 13, 2026

Copy link
Copy Markdown
Contributor Author

according #18644 , the issuer has test this on real hardware.

And added tests for DMU_MAX_ACCESS.

Signed-off-by: tiehexue <tiehexue@hotmail.com>
@tiehexue tiehexue mentioned this pull request Jun 18, 2026
14 tasks
Used wrong API to copy the data while just
wrapping is good; and removed unneccessary
calculation; also refined testcase to
check cold randread DIO/ARC IOPS, DIO is
slightly better.

Signed-off-by: tiehexue <tiehexue@hotmail.com>
Comment thread module/os/freebsd/zfs/abd_os.c Outdated
Comment thread module/os/freebsd/zfs/abd_os.c
Comment thread module/os/freebsd/zfs/zvol_os.c
Comment thread module/os/freebsd/zfs/zvol_os.c Outdated
@tiehexue

Copy link
Copy Markdown
Contributor Author

@amotin thanks! I should have been mad to touch freebsd, and actually later, I did not touch in "true async" PR. Now the newly added test case and per mannually test, zvol on freebsd with cold randread with DIO has 10% more IOPS than ARC, and ARC warm read is neary 3x times than DIO read. Also no ARC buffer size increasing in DIO.

@tiehexue

Copy link
Copy Markdown
Contributor Author

@amotin thanks! I should have been mad to touch freebsd, and actually later, I did not touch in "true async" PR. Now the newly added test case and per mannually test, zvol on freebsd with cold randread with DIO has 10% more IOPS than ARC, and ARC warm read is neary 3x times than DIO read. Also no ARC buffer size increasing in DIO.

It is 30% increasing when not configure debug.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Status: Code Review Needed Ready for review and testing

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants