Skip to content

File corruption when disk reaches its capacity threshold in a distributed volume with sharding enabled #4686

Description

@duzehua-tech

Hello,

We are currently facing a problem .

When copying a large file A to a distributed volume with sharding enabled (target file named B), a disk on a single brick reaches its capacity threshold during the copy process. After the copy completes, the MD5 checksum of file B does not match that of file A.

This issue can be reproduced on a single node.

Cluster Architecture:

We have a 1-node GlusterFS cluster.

Disk Type: 2 hdd
Individual disk capacity: 5 GB
Usable capacity per disk: ~3.4 GB
Gluster version: 11.2

Volume Configuration:

The storage is configured as a Disperse Volume with the following option: (gluster v info)

Volume Name: r2
Type: Distribute
Volume ID: 78322c39-0f17-4092-b672-164ec899e215
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: node1:/opt/b1/brick
Brick2: node1:/opt/b2/brick
Options Reconfigured:
cluster.min-free-disk: 20%
features.shard-block-size: 500Mb
features.shard: on

storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

Steps to Reproduce
Single machine with 2 disks, each 5GB (smaller space allows faster reproduction and easier debugging; for machines with larger disks, partitioning can be used instead).

1 Install GlusterFS: glusterfs-11.2

2 Configure the volume:

gluster v create r2 node1:/opt/b1/brick node1:/opt/b2/brick force
gluster v start r2
gluster v set r2 features.shard on
gluster v set r2 features.shard-block-size 500MB
gluster v set r2 cluster.min-free-disk 20%

3 Fill 3.4GB on the first replica disk (5GB total) to approach the disk space threshold:

cd /opt/b1
dd if=/dev/zero of=/opt/b1/3.4g.img bs=1M count=3400
mount.glusterfs node1:r2 /mnt/r2
cd /mnt/r2/
mkdir test

4 Copy a 3.3GB file (xx.0602):

cp /var/log/glusterfs/xx.0602 /mnt/r2/test/
du -sh /var/log/glusterfs/xx.0602
3.3G /var/log/glusterfs/xx.0602

5 check the result: The issue always reproduces 1 in 3 times. When it occurs, the same shard (e.g., 1019cabf-eedf-4845-a32b-3fc82600561d.2) appears on both bricks with actual data.
Expected behavior: each shard file with data should exist on one brick, not two .

Shard distribution across bricks:
root@node1:/mnt/r2/test# find /opt/b*/brick | grep ".1"
/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.1

root@node1:/mnt/r2/test# find /opt/b*/brick | grep 61d
/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.1
/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6
/opt/b2/brick/.glusterfs/10/19/1019cabf-eedf-4845-a32b-3fc82600561d
/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.3
/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.5
/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6

Shard sizes (showing same shard .2 exists on both bricks with data):

root@node1:/mnt/r2/test# du -sh /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
2.1M    /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2

root@node1:/mnt/r2/test# du -sh /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
0       /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4

root@node1:/mnt/r2/test# du -sh /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
512M    /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2

root@node1:/mnt/r2/test# du -sh /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
512M    /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4

root@node1:/mnt/r2/test# du -sh /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6
0       /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6

root@node1:/mnt/r2/test# du -sh /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6
511M    /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6

MD5 checksum mismatch (corrupted file):

first time:
root@node1:/mnt/r2/test# md5sum xx.0602 
0d6db58b9fb15f063e0893435587eb1c  xx.0602
root@node1:/mnt/r2/test# md5sum /var/log/glusterfs/xx.0602 
0d6db58b9fb15f063e0893435587eb1c  /var/log/glusterfs/xx.0602
root@node1:/mnt/r2/test# 

second time:
# Corrupted file on mount point
root@node1:/mnt/r2/test# md5sum xx.0602 
af9b0c3dc7c6f28ebc915f0bdf15ac3c xx.0602
# Original file
root@node1:/mnt/r2/test# md5sum /var/log/glusterfs/xx.0602 
0d6db58b9fb15f063e0893435587eb1c/var/log/glusterfs/xx.0602
root@node1:/mnt/r2/test# 

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions