Skip to content

READ/WRITE LBA reset to 0 in SEQ mix workload when Drive Size IS NOT a multiple of the workload block size #580

Open
@akoundal

Description

@akoundal

On a drive whose size IS a multiple of the workload block size, if we run a time based 70R/30W mixed sequential workload, when the
Read LBA’s reach the end of the drive, ONLY the Read LBA is reset to 0. The Write LBA will continue to the end of the drive.
(The run time is longer than the time it takes to read (and write) to all the LBA’s in the drive at least once)

Example 1 (the drive IS a multiple of the workload block size):

Drive size: 112,206,141,680 bytes (io_u->file->real_file_size)
The workload block size: 131072 bytes (128 KB)
The drive size IS a multiple of 128 KB

When ‘fio’ comes to the LBA 112,206,020,608, writing 128KB would write up to: 112,206,151,680 (the size of the drive).
The last 128 KB of data is written. After that, there is a check in get_next_seq_offset() that checks to see if ‘fio’
is trying access beyond the end of the file:

    if (f->last_pos[ddir] >= f->io_size + get_start_offset(td, f) &&
        o->time_based) {
            f->last_pos[ddir] = f->file_offset;
            loop_cache_invalidate(td, f);
    }

In this example:
f->last_pos[ddir] == 112,206,151,680
f->io_size == 112,206,151,680
get_start_offset(td,f) == 0
o->time_based == 1

The result is that f->last_pos[ddir] is set to 0
(i.e. the start position is reset to zero for this ddir only. (ddir == 0 for Reads, 1 for Writes))
do_io() does not exit when the Read LBA’s reaches the end of the drive

============================

On a drive whose size IS NOT a multiple of the workload block size, if we run a time based 70R/30W mixed sequential workload, the issue is that
when the Read LBA’s reach the end of the drive, BOTH the LBA’s for the Reads and the Writes will get reset to 0.
(The run time is longer than the time it takes to read (and write) to all the LBA’s in the drive once)

Example 2 (the drive IS NOT a multiple of the workload block size):

Drive size: 112,206,086,144 bytes (io_u->file->real_file_size)
The workload block size: 131072 bytes (128 KB)
The drive size IS NOT a multiple of 128 KB

When ‘fio’ comes to the LBA 112,206,020,608, writing 128KB would write to: 112,206,151,680 (this is greater than the size of the drive).
The last 128KB of data cannot be written. There is a check in fill_io_u() to see if it is trying to write past the end of the file:

    if (io_u->offset + io_u->buflen > io_u->file->real_file_size) {
            dprint(FD_IO, "io_u %p, off=0x%llx + len=0x%lx exceeds file size=0x%llx\n",
                    io_u,
                    (unsigned long long) io_u->offset, io_u->buflen,
                    (unsigned long long) io_u->file->real_file_size);
            return 1;
    }

In this example:
io_u->offset == 112,206,020,608
io_u->buflen == 131,072
io_u->offset + io_u_buflen == 112,206,151,680
io_u->file->real_file_size == 112,206,086,144

The result is that the if() is true and this returns 1.
This eventually leads to do_io() exiting (when the READ LBA’s reach the end of the drive) and restarting with both the Read and Write LBA starting at 0.

LBA not aligned to 128K
image

LBA aligned to 128K

image

Command line used:
./fio --name=global --time_based --direct=1 --norandommap --randrepeat=0 --buffered=0 --refill_buffers --name=job --ioengine=libaio --group_reporting --filename=/dev/nvme0n1 --numjobs=1 --iodepth=128 --bs=128k --rw=rw --rwmixread=70 --runtime=5400 --write_iops_log=128kqd128kmixs --write_lat_log=128qd128kmixs --log_offset=1 --log_avg_msec=0 --debug=io --output test.log

FIO version (both show same results).
FIO3.5
fio-3.5-86-gcefd2

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions