Skip to content

Verification failures writing to small range with bs_unaligned=1 #1092

Open
@barakp

Description

@barakp

Seem like there's an issue when issuing IOs to a relatively small range when verifying headers.

Using fio version:

[ec2-user@ip-172-31-30-241 ~]$ /usr/local/bin/fio --version
fio-3.23

The following job file:

[test]
direct=0
ioengine=libaio
#do_verify=1
verify=crc32c
verify_fatal=1
verify_dump=1
bwavgtime=1000
iopsavgtime=1000
rw=randrw
iodepth=2
bs_unaligned=1
bsrange=512b-1k
time_based=1
size=10m
verify_backlog=10
runtime=6h
serialize_overlap=1
filename=/test/barak.test

Debug log:

verify   19888 fill crc32c io_u 0x2631780, len 854
io       19888 queue: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=1,file=/test/barak.test
io       19888 calling ->commit(), depth 2
io       19888 io_u_queued_complete: min=1
io       19888 getevents: 1
io       19888 complete: io_u 0x26325c0: off=0x31b200,len=0x200,ddir=0,file=/test/barak.test

io       19888 fill: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test
io       19888 prep: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test
io       19888 prep: io_u 0x26325c0: ret=0
verify   19888 fill random bytes len=579
verify   19888 fill crc32c io_u 0x26325c0, len 579
io       19888 queue: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test
io       19888 complete: io_u 0x26325c0: off=0x9bf600,len=0x243,ddir=1,file=/test/barak.test


io       19888 prep: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=0,file=/test/barak.test
io       19888 prep: io_u 0x2631780: ret=0
io       19888 queue: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=0,file=/test/barak.test
io       19888 complete: io_u 0x2631780: off=0x9bf600,len=0x356,ddir=0,file=/test/barak.test

verify: bad header length 579, wanted 854 at file /dev/nvme6n1 offset 10221056, length 854 (requested block: offset=10221056, length=854)
       hdr_fail data dumped as nvme6n1.10221056.hdr_fail

As can be seen in debug log fio is issuing two unaligned writes, one with len 854 and the other with len 579, to the same offset (0x9bf600)

This means that header len on disk is populated with recent write buflen which is 579, due to the following code with td->o.bs_unaligned=1:

static unsigned int get_hdr_inc(struct thread_data *td, struct io_u *io_u)
{
	unsigned int hdr_inc;

	/*
	 * If we use bs_unaligned, buflen can be larger than the verify
	 * interval (which just defaults to the smallest blocksize possible).
	 */
	hdr_inc = io_u->buflen;
	if (td->o.verify_interval && td->o.verify_interval <= io_u->buflen &&
	    !td->o.bs_unaligned)
		hdr_inc = td->o.verify_interval;

	return hdr_inc;
}

When bs_unaligned=1 header len is set to submitted buflen. However on overwrite to the same block buflen may differ. This problem does not exist with bs_unaligned=0 as header len will be fixed to the verify_interval the user sets in config.

I believe the straightforward way to fix this will be not to verify header when bs_unaligned=1. Please lmk if I'm missing something (e.g. my jobfile isn't valid). I've verified the same workload works when bs_unaligned isn't set.

Metadata

Metadata

Assignees

No one assigned

    Labels

    triagedIssue cause is understood but a patch is needed to fix it

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions