Skip to content

Conversation

@minwooim
Copy link
Contributor

@minwooim minwooim commented Dec 2, 2025

Add support for veryfing latest numberio in the verify header read from written offsets.
Previously, in the offline verification, read phase has not supported numberio compare since
it's not that simple to figure out the latest numberios to the offsets. This patch allows
verifying latest written verify header based on numberio.

Available test cases are the below:

  • Sequential
fio --ioengine=io_uring_cmd --cmd_type=nvme --name=yeah --filename=/dev/ng0n1 --rw=write --bs=4k --size=32k --verify=crc32 --do_verify=0 --io_size=64k --verify_state_save=1

fio --ioengine=io_uring_cmd --thread=1 --cmd_type=nvme --name=yeah --filename=/dev/ng0n1 --rw=write --bs=4k --size=32k --verify=crc32 --do_verify=1 --io_size=64k --verify_state_load=1 --verify_only=1
  • Random
fio --ioengine=io_uring_cmd --cmd_type=nvme --name=yeah --filename=/dev/ng0n1 --rw=randwrite --bs=4k --size=32k --verify=crc32 --do_verify=0 --io_size=64k --verify_state_save=1 --norandommap

fio --ioengine=io_uring_cmd --thread=1 --cmd_type=nvme --name=yeah --filename=/dev/ng0n1 --rw=randwrite --bs=4k --size=32k --verify=crc32 --do_verify=1 --io_size=64k --verify_state_load=1 --verify_only=1 --norandommap

If --io_size > --size, not only just rw workloads, but write-only
workloads should also be wrapped around to the initial point.  This
patch removed `td_rw(td)` checking to also support write-only workloads.
If wrap-around to the previously written area, we should consider the
overlap risks, so put this condition to `fio_offset_overlap_risk()`
also to log io pieces into the rb-tree rather than flist to keep the
latest io pieces.

Signed-off-by: Minwoo Im <[email protected]>
fio.h Outdated
td->o.ddir_seq_add || (td->o.ddir_seq_nr > 1) ||
td->o.io_size > td->o.size)
td->o.io_size > td->o.size ||
td->o.verify_write_sequence)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right approach. Fio should reason about the workload to determine whether there is an overlap risk. Whether there is a risk or not should not be independently affected by the verify_write_sequence option that the user can set.

Please fully explicate the reasoning for this change (what type of workloads require this?) and then try to come up with a way to accommodate these workloads that does not involve verify_write_sequence determining overlap risk.

Copy link
Contributor Author

@minwooim minwooim Dec 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the right approach. Fio should reason about the workload to determine whether there is an overlap risk. Whether there is a risk or not should not be independently affected by the verify_write_sequence option that the user can set.

Indeed, I think td->o.io_size > td->o.size is enough to detect the workload overlap since if so, same offset I/Os might be overlapped.

Please fully explicate the reasoning for this change (what type of workloads require this?) and then try to come up with a way to accommodate these workloads that does not involve verify_write_sequence determining overlap risk.

(will add the following descriptions to the commit)
Let's say we have a workload writing offsets more than two times (overwrite same offsets). In this case, without numberio verification in the verify header, we can't simply verify the latest written data has been flushed to the media or not. If device returns the previous data even if the latest data has been written successfully, fio verification can't simply detect the data is the latest. To verify the time-considered data integrity in the --verify_only=1 workload, we should turn on the verify_write_sequence even in --verify_only=1 case which overrides the verify_write_sequence to 0.

The following write phases is the example:

write (offset=0, length=4096) -> numberio = 0
write (offset=4096, length=4096) -> numberio = 1
...  X number of writes
write (offset=0, length=4096) -> numberio = (X+1) + 1
write (offset=4096, length=4096) -> numberio = (X+1) + 2

After this, verification phase should verify the offset 0 and 4096 as the latest data with numberio. In this case, --verify_only=1 workload can simply read a single lap with the latest data.

This patch is for this kind of workload verification.

@sitsofe
Copy link
Collaborator

sitsofe commented Dec 2, 2025

Wow I happened to be looking at this via #1970 just yesterday. My concern is that this isn't going to work for jobs where the initial job was somehow extended past a single pass without using loops:

rm -f /tmp/fio.tmp
./fio --filename=/tmp/fio.tmp --size=12k --time_based --runtime=4s --rate_iops=1 --do_verify=0 --verify=crc32c --rw=write --name=multipass
./fio --filename=/tmp/fio.tmp --size=12k --verify_state_load=1 --verify_only=1 --verify=crc32c --rw=write --name=multipass

@minwooim
Copy link
Contributor Author

minwooim commented Dec 2, 2025

Wow I happened to be looking at this via #1970 just yesterday. My concern is that this isn't going to work for jobs where the initial job was somehow extended past a single pass without using loops:

Have not seen #1970, but yes, it seems like the same issue which I would like to resolve (old vs. new)

rm -f /tmp/fio.tmp
./fio --filename=/tmp/fio.tmp --size=12k --time_based --runtime=4s --rate_iops=1 --do_verify=0 --verify=crc32c --rw=write --name=multipass
./fio --filename=/tmp/fio.tmp --size=12k --verify_state_load=1 --verify_only=1 --verify=crc32c --rw=write --name=multipass

--loops= simply repeats not only the write phases but also verify phase with the same verification criteria. So, if more than one pass to the same offsets, numberio is the one which we should take as a criteria for data verification.

@sitsofe
Copy link
Collaborator

sitsofe commented Dec 2, 2025

@minwooim:

--loops= simply repeats not only the write phases but also verify phase with the same verification criteria. So, if more than one pass to the same offsets, numberio is the one which we should take as a criteria for data verification.

Ah good point but I still think the jobs I posted will fail with your PR...

@minwooim
Copy link
Contributor Author

minwooim commented Dec 2, 2025

Ah good point but I still think the jobs I posted will fail with your PR...

@sitsofe , Do you mean verify failure? I ran your script and offset=0x0, 0x1000, 0x2000, 0x0 are written in the given time and read phase reads offset=0x0, 0x1000, 0x2000 with the latest numberio without any verify failure.

Write phase

multipass: Laying out IO file (1 file / 0MiB)
io       615324 declare unneeded cache /tmp/fio.tmp: 0/12288
io       615358 declare unneeded cache /tmp/fio.tmp: 0/12288
io       615358 fill: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 prep: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 fill random bytes len=4096
verify   615358 fill crc32c io_u 0x64fe6cbc4900, len 4096
verify   615358 log_inflight: numberio=0, inflight_idx=0
io       615358 queue: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 complete: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 invalidate_inflight: numberio=0, inflight_idx=0
io       615358 calling ->commit(), depth 1
io       615358 fill: io_u 0x64fe6cbc4900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 prep: io_u 0x64fe6cbc4900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 fill random bytes len=4096
verify   615358 fill crc32c io_u 0x64fe6cbc4900, len 4096
verify   615358 log_inflight: numberio=1, inflight_idx=0
io       615358 queue: io_u 0x64fe6cbc4900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 complete: io_u 0x64fe6cbc4900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 invalidate_inflight: numberio=1, inflight_idx=0
io       615358 calling ->commit(), depth 1
io       615358 fill: io_u 0x64fe6cbc4900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 prep: io_u 0x64fe6cbc4900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 fill random bytes len=4096
verify   615358 fill crc32c io_u 0x64fe6cbc4900, len 4096
verify   615358 log_inflight: numberio=2, inflight_idx=0
io       615358 queue: io_u 0x64fe6cbc4900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 complete: io_u 0x64fe6cbc4900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 invalidate_inflight: numberio=2, inflight_idx=0
io       615358 calling ->commit(), depth 1
io       615358 declare unneeded cache /tmp/fio.tmp: 0/12288ta 00m:01s]
io       615358 fill: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 prep: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 fill random bytes len=4096
verify   615358 fill crc32c io_u 0x64fe6cbc4900, len 4096
verify   615358 log_inflight: numberio=3, inflight_idx=0
io       615358 queue: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615358 complete: io_u 0x64fe6cbc4900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615358 invalidate_inflight: numberio=3, inflight_idx=0
io       615358 calling ->commit(), depth 1
io       615358 invalid direction received ddir = -2io       615358 io_u 0x64fe6cbc4900, setting file failed
io       615358 get_io_u failed
io       615358 close ioengine psync
io       615358 free ioengine psync

Read phase

verify   615359 verify_assign_state numberio=4, inflight[0]=18446744073709551615
io       615393 declare unneeded cache /tmp/fio.tmp: 0/12288
io       615393 fill: io_u 0x574535ed0900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 prep: io_u 0x574535ed0900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 complete: io_u 0x574535ed0900: off=0x0,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 fill: io_u 0x574535ed0900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 prep: io_u 0x574535ed0900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 complete: io_u 0x574535ed0900: off=0x1000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 fill: io_u 0x574535ed0900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 prep: io_u 0x574535ed0900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
io       615393 complete: io_u 0x574535ed0900: off=0x2000,len=0x1000,ddir=1,file=/tmp/fio.tmp
verify   615393 starting loop
io       615393 declare unneeded cache /tmp/fio.tmp: 0/12288
verify   615393 get_next_verify: ret io_u 0x574535ed0900
io       615393 prep: io_u 0x574535ed0900: off=0x0,len=0x1000,ddir=0,file=/tmp/fio.tmp
verify   615393 verify_state_should_stop numberio=0
io       615393 queue: io_u 0x574535ed0900: off=0x0,len=0x1000,ddir=0,file=/tmp/fio.tmp
io       615393 complete: io_u 0x574535ed0900: off=0x0,len=0x1000,ddir=0,file=/tmp/fio.tmp
verify   615393 crc32c verify io_u 0x574535ed0900, len 4096
verify   615393 get_next_verify: ret io_u 0x574535ed0900
io       615393 prep: io_u 0x574535ed0900: off=0x1000,len=0x1000,ddir=0,file=/tmp/fio.tmp
verify   615393 verify_state_should_stop numberio=1
io       615393 queue: io_u 0x574535ed0900: off=0x1000,len=0x1000,ddir=0,file=/tmp/fio.tmp
io       615393 complete: io_u 0x574535ed0900: off=0x1000,len=0x1000,ddir=0,file=/tmp/fio.tmp
verify   615393 crc32c verify io_u 0x574535ed0900, len 4096
verify   615393 get_next_verify: ret io_u 0x574535ed0900
io       615393 prep: io_u 0x574535ed0900: off=0x2000,len=0x1000,ddir=0,file=/tmp/fio.tmp
verify   615393 verify_state_should_stop numberio=2
io       615393 queue: io_u 0x574535ed0900: off=0x2000,len=0x1000,ddir=0,file=/tmp/fio.tmp
io       615393 complete: io_u 0x574535ed0900: off=0x2000,len=0x1000,ddir=0,file=/tmp/fio.tmp
verify   615393 crc32c verify io_u 0x574535ed0900, len 4096
verify   615393 get_next_verify: empty
verify   615393 exiting loop
io       615393 close ioengine psync
io       615393 free ioengine psync

Am I missing something here?

[updated] I see. verify phase does not detect the overlap risks so that flist is taken and the old data has been compared.
To do this, verify state file should represent the fact that write phase had a risk for overlap then it will go to rb-tree.

In read-only verify phase, `numberio` in the verify header could not be
checked since read-only workload can't simply know the certain sequence
of the write commands.  But, due to Commit 466a2e6 ("backend: fix
zero-numberio in dry-run"), dry-run, which is a pre-session before the
actual verify phase to build up read @io_u instances, now builds @io_u's
numberio properly so that verify phase simply compare the `numberio` in
the read block's verify header.

Signed-off-by: Minwoo Im <[email protected]>
@minwooim minwooim force-pushed the check-numberio-read-only branch from cc5ceaf to 36428f2 Compare December 2, 2025 21:54
@minwooim
Copy link
Contributor Author

minwooim commented Dec 2, 2025

@sitsofe , thw following patch made:

  1. Dry run for wrapped around @io_u instances to log io pieces with the given numberio in the verify state file. (considered write only; no trim yet)
  2. Put the io pieces to the rb-tree, not flist which does not consider overlaps.

This is not a clean-state patch, definitely need to revisit:

diff --git a/backend.c b/backend.c
index e09c210a..f205ea13 100644
--- a/backend.c
+++ b/backend.c
@@ -1686,10 +1686,14 @@ static int exec_string(struct thread_options *o, const char *string,
  */
 static uint64_t do_dry_run(struct thread_data *td)
 {
-       td_set_runstate(td, TD_RUNNING);
+       struct thread_io_list *s = td->vstate;
+       uint64_t target_numberio = s->numberio;
+
+       td_set_runstate(td, TD_DRY_RUNNING);

        while ((td->o.read_iolog_file && !flist_empty(&td->io_log_list)) ||
-               (!flist_empty(&td->trim_list)) || !io_complete_bytes_exceeded(td)) {
+               (!flist_empty(&td->trim_list)) || !io_complete_bytes_exceeded(td) ||
+               td->io_issues[DDIR_WRITE] < target_numberio) {
                struct io_u *io_u;
                int ret;

@@ -1723,6 +1727,8 @@ static uint64_t do_dry_run(struct thread_data *td)
                (void) ret;
        }

+       td_set_runstate(td, TD_RUNNING);
+
        return td->bytes_done[DDIR_WRITE] + td->bytes_done[DDIR_TRIM];
 }

diff --git a/fio.h b/fio.h
index 1fda07f8..efa57860 100644
--- a/fio.h
+++ b/fio.h
@@ -713,6 +713,7 @@ enum {
        TD_INITIALIZED,
        TD_RAMP,
        TD_SETTING_UP,
+       TD_DRY_RUNNING,
        TD_RUNNING,
        TD_PRE_READING,
        TD_VERIFYING,
diff --git a/io_u.c b/io_u.c
index 2d5986ec..91786ccb 100644
--- a/io_u.c
+++ b/io_u.c
@@ -390,6 +390,7 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,
                }
         }

+retry:
        if (f->last_pos[ddir] < f->real_file_size) {
                uint64_t pos;

@@ -430,6 +431,10 @@ static int get_next_seq_offset(struct thread_data *td, struct fio_file *f,

                *offset = pos;
                return 0;
+       } else if (td->runstate == TD_DRY_RUNNING) {
+               f->last_pos[ddir] = f->file_offset;
+               loop_cache_invalidate(td, f);
+               goto retry;
        }

        return 1;
diff --git a/iolog.c b/iolog.c
index dcf6083c..27e393a4 100644
--- a/iolog.c
+++ b/iolog.c
@@ -306,7 +306,7 @@ void log_io_piece(struct thread_data *td, struct io_u *io_u)
         * the rb insert/lookup for handling. Sort writes if we have offset
         * modifier which can also create duplicate blocks.
         */
-       if (!fio_offset_overlap_risk(td)) {
+       if (!fio_offset_overlap_risk(td) && td->runstate != TD_DRY_RUNNING) {
                INIT_FLIST_HEAD(&ipo->list);
                flist_add_tail(&ipo->list, &td->io_hist_list);
                ipo->flags |= IP_F_ONLIST;
diff --git a/libfio.c b/libfio.c
index 57f3f858..7570d695 100644
--- a/libfio.c
+++ b/libfio.c
@@ -209,7 +209,7 @@ static const char *td_runstates[] = {

 const char *runstate_to_name(int runstate)
 {
-       compiletime_assert(TD_LAST == 12, "td runstate list");
+       compiletime_assert(TD_LAST == 13, "td runstate list");
        if (runstate >= 0 && runstate < TD_LAST)
                return td_runstates[runstate];

Copy link
Collaborator

@vincentkfu vincentkfu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first change to fio_offset_overlap_risk looks good to me.
The second change probably is ok as well. However, I'm not comfortable letting it in because get_next_seq_offset() has to handle many different situations and we have very little testing for multiple files, holed IO, backwards IO, etc.

@vincentkfu
Copy link
Collaborator

For now I think it's possible to accomplish the aim of this PR by explicitly setting verify_write_sequence=1. Do you agree?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants