zvol with directio#18661
Conversation
amotin
left a comment
There was a problem hiding this comment.
I've had only a very brief look, but I am not sure how much good it will make on FreeBSD without first implementing unmapped I/O support (G_PF_ACCEPT_UNMAPPED).
|
One more comment is that on Linux some people might use |
Thanks, I will come to that. |
Oh. But I guess, linux guy who is not zfs guy should mean Direct I/O if using O_DIRECT, and his workload should be good in Direct I/O, so we give him Direct I/O. And zfs guy should know page cache only exists in mmap thing, and not necessary using O_DIRECT, if used, we also try Direct I/O. The PR also has preflight check for if Direct I/O possible, e.g. page alignment. |
For files on file systems -- may be true, but for block devices from user space -- it is not. Don't ask me why Linux do it for block devices, because FreeBSD, where I came from, does not. |
6f88807 to
7923d1a
Compare
Ok, below is how I think, and this PR did not touch O_DIRECT flag.
|
680eb8b to
8abf5e5
Compare
I don't know if O_DIRECT even passed through the block layers, but system-wide knob is not the nicest way to control it. Even dataset property we have for file systems (and that should probably be ported) is quite coarse. |
Oh, I am planning to add property to ZVOL. Anyway, we are not able to add another O_DIRECT like flag in kernel API? More ideas? |
Signed-off-by: tiehexue <tiehexue@hotmail.com>
|
@amotin hi, you may want to take a look this PR, though in draft. Start with test cases, see if valuable. Now it can pass existing zvol tests, and newly added ones and other test cases without regressions. Both in linux and freebsd works. Actually, I am a little confusing if it really works, because, not so much code. Despite abd_alloc_from_pages in zvol_os.c in freebsd, other code are straight. And I do find ARC data size is not increased if DIO enabled in my local dev machine. There is one test case can not be passed in FreeBSD, and I just remove it. I see it as an application issue not a zvol/directio one. Suppose an application is keeping writing to a file/zvol with fsync, with ARC, zfs/zvol just ensure all data will be in disk, but not in time, they may be in ZIL. Now, if there is another application is keeping reading the file/zvol without ARC, with DirectIO, what should we guarantee? We can not avoid stale read, and we shouldn't. If applications is using file/zvol as "communication" channel, they should do locks in application level. Or, file/zvol should not be used in such scenario. |
IIRC for regular files Direct I/O provides full coherency. Direct I/O write should get to the disk, and its block pointer should be stored in both dbuf and ZIL. Following reads should read from disk by the block pointer stored in respective dbuf until the TXG is committed. If system crash before the commit, ZIL will replay that write, reading by the stored block pointer. If not, then after TXG commit the written block will be in the indirect blocks, etc. |
Include DIO write -> ARC read, etc. Signed-off-by: tiehexue <tiehexue@hotmail.com>
I added DIO write -> ARC read test case if for coherency test if I understand correctly, it passes in my local dev. |
|
according #18644 , the issuer has test this on real hardware. |
And added tests for DMU_MAX_ACCESS. Signed-off-by: tiehexue <tiehexue@hotmail.com>
Used wrong API to copy the data while just wrapping is good; and removed unneccessary calculation; also refined testcase to check cold randread DIO/ARC IOPS, DIO is slightly better. Signed-off-by: tiehexue <tiehexue@hotmail.com>
|
@amotin thanks! I should have been mad to touch freebsd, and actually later, I did not touch in "true async" PR. Now the newly added test case and per mannually test, zvol on freebsd with cold randread with DIO has 10% more IOPS than ARC, and ARC warm read is neary 3x times than DIO read. Also no ARC buffer size increasing in DIO. |
It is 30% increasing when not configure debug. |
Motivation and Context
DirectIO for ZVOL, refer to #18644 .
Description
Just following how zfs filesystem handle the DriectIO, still draft version, with tests passed locally (linux only).
How Has This Been Tested?
Local functional tests and coming CI.
Types of changes
Checklist:
Signed-off-by.