Fix recv EINVAL/corruption when source has smaller recordsize setting#18663
Fix recv EINVAL/corruption when source has smaller recordsize setting#18663tuxoko wants to merge 3 commits into
Conversation
|
I'd think if the block size was increased on fs1 by the write, backward replication to fs1 should increase the block size there too. The complications here I guess are coming from workarounds from the large blocks support being optional. |
|
@amotin Given the comment saying we disallow --large-block to switch to off in recv_check_large_blocks(), However, if we blindly do that now, anyone with already mismatched file will lose data, so I'm not sure what's the best way going forward. |
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
|
update to fix the corruption issue. |
|
Adopting the new recordsize is what I'd expect, but we'll have to think that though carefully. With the latest push I see the rsend tests are now hitting this: |
|
@behlendorf I'm not sure if there's a good way around this. Or maybe we should go back to always truncate with single block and just rewrite whatever we have... |
|
Update to fix the above non P2 block size issue. Also update test pool to include such file. |
If source and target has different recordsize setting, it's possible to have same file to have different block size when reverse replication is allowed. This has two consequences in the current receive_handle_existing_object. If target side has more than 1 block, it will fail with EINVAL. ``` $ sudo zfs create -o recordsize=64k pp/fs0 $ sudo zfs create -o recordsize=128k pp/fs1 $ sudo dd if=/dev/urandom of=/pp/fs0/test bs=1M count=1 # src created with 64k $ sudo zfs snap pp/fs0@s00 $ sudo zfs send pp/fs0@s00 | sudo zfs recv -F pp/fs1 # tgt received with 64k $ sudo dd if=/dev/urandom of=/pp/fs1/test bs=1M count=1 # tgt truncated and rewritten with 128k $ sudo zfs snap pp/fs1@s01 $ sudo zfs send -i @s00 pp/fs1@s01 | sudo zfs recv pp/fs0 # reverse send, src remains 64k $ sudo dd if=/dev/urandom of=/pp/fs0/test bs=1M count=1 # src modified, remains 64k $ sudo zfs snap pp/fs0@s02 $ sudo zfs send -i @s01 pp/fs0@s02 | sudo zfs recv pp/fs1 # forward send failed cannot receive incremental stream: invalid backup stream ``` If target side has 1 block, and source modifies file partially, it will cause target to truncate everything and only left with the modified part, causing corruption. ``` $ sudo zfs create -o recordsize=64k pp/fs0 $ sudo zfs create -o recordsize=128k pp/fs1 $ sudo dd if=/dev/urandom of=/pp/fs0/test bs=128k count=1 # src created with 64k $ sudo zfs snap pp/fs0@s00 $ sudo zfs send pp/fs0@s00 | sudo zfs recv -F pp/fs1 # tgt received with 64k $ sudo dd if=/dev/urandom of=/pp/fs1/test bs=128k count=1 # tgt truncated and rewritten with 128k $ sudo zfs snap pp/fs1@s01 $ sudo zfs send -i @s00 pp/fs1@s01 | sudo zfs recv pp/fs0 # reverse send, src remains 64k $ sudo dd if=/dev/urandom of=/pp/fs0/test bs=64k count=1 conv=notrunc # src modified single 64k block $ sudo zfs snap pp/fs0@s02 $ sudo zfs send -i @s01 pp/fs0@s02 | sudo zfs recv pp/fs1 # forward send, tgt will get corrupted $ md5sum /pp/fs0/test /pp/fs1/test 7b8ab0f9b9bded4bb1cfe60b8d2ad925 /pp/fs0/test 229c5798dcf15440afafa734f7e42769 /pp/fs1/test ``` To fix this issue, when we encounter files with same gen but different blksz. We look ahead into following records to see if WRITEs and FREEs will completely overwrite existing content. If so, then it mean we are safe to do truncate and take on new blksz, otherwise we don't truncate and the blksz will remain different. Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
If source and target has different recordsize setting, it's possible to
have same file to have different block size when reverse replication is
allowed. This has two consequences in the current
receive_handle_existing_object.
If target side has more than 1 block, it will fail with EINVAL.
If target side has 1 block, and source modifies file partially, it will
cause target to truncate everything and only left with the modified
part, causing corruption.
To fix this issue, when we encounter files with same gen but different
blksz. We look ahead into following records to see if WRITEs and FREEs
will completely overwrite existing content. If so, then it mean we are
safe to do truncate and take on new blksz, otherwise we don't truncate
and the blksz will remain different.
Motivation and Context
Description
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by.