File corruption when disk reaches its capacity threshold in a distributed volume with sharding enabled

Hello,

We are currently facing a problem .

When copying a large file A to a distributed volume with sharding enabled (target file named B), a disk on a single brick reaches its capacity threshold during the copy process. After the copy completes, the MD5 checksum of file B does not match that of file A.

This issue can be reproduced on a single node.

**Cluster Architecture:**

We have a 1-node GlusterFS cluster.

Disk Type: 2 hdd
Individual disk capacity: 5 GB
Usable capacity per disk: ~3.4 GB
Gluster version: 11.2

**Volume Configuration:**

The storage is configured as a Disperse Volume with the following option: (gluster v info)

Volume Name: r2
Type: Distribute
Volume ID: 78322c39-0f17-4092-b672-164ec899e215
Status: Started
Snapshot Count: 0
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: node1:/opt/b1/brick
Brick2: node1:/opt/b2/brick
Options Reconfigured:
**_cluster.min-free-disk: 20%
features.shard-block-size: 500Mb
features.shard: on_**
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on

**Steps to Reproduce**
    Single machine with 2 disks, each 5GB (smaller space allows faster reproduction and easier debugging; for machines with larger disks, partitioning can be used instead).

1 Install GlusterFS: glusterfs-11.2

2 Configure the volume:

gluster v create r2 node1:/opt/b1/brick node1:/opt/b2/brick force
gluster v start r2
gluster v set r2 features.shard on
gluster v set r2 features.shard-block-size 500MB
gluster v set r2 cluster.min-free-disk 20%

3 Fill 3.4GB on the first replica disk (5GB total) to approach the disk space threshold:

cd /opt/b1
dd if=/dev/zero of=/opt/b1/3.4g.img bs=1M count=3400
mount.glusterfs node1:r2 /mnt/r2
cd /mnt/r2/
mkdir test

4 Copy a 3.3GB file (xx.0602):

cp /var/log/glusterfs/xx.0602 /mnt/r2/test/
du -sh /var/log/glusterfs/xx.0602
3.3G    /var/log/glusterfs/xx.0602

5 check the result: The issue always reproduces 1 in 3 times. When it occurs, the same shard (e.g., 1019cabf-eedf-4845-a32b-3fc82600561d.2) appears on both bricks with actual data. 
Expected behavior: each shard file with data should exist on one brick, not two .

Shard distribution across bricks:
	root@node1:/mnt/r2/test# find /opt/b*/brick | grep "\.1"
	/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.1

	root@node1:/mnt/r2/test# find /opt/b*/brick | grep 61d
	/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.1
	/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
	/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
	/opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6
	/opt/b2/brick/.glusterfs/10/19/1019cabf-eedf-4845-a32b-3fc82600561d
	/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
	/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.3
	/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
	/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.5
	/opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6

Shard sizes (showing same shard .2 exists on both bricks with data):

	root@node1:/mnt/r2/test# du -sh /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
	2.1M    /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2

	root@node1:/mnt/r2/test# du -sh /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
	0       /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4

	root@node1:/mnt/r2/test# du -sh /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2
	512M    /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.2

	root@node1:/mnt/r2/test# du -sh /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4
	512M    /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.4

	root@node1:/mnt/r2/test# du -sh /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6
	0       /opt/b1/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6

	root@node1:/mnt/r2/test# du -sh /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6
	511M    /opt/b2/brick/.shard/1019cabf-eedf-4845-a32b-3fc82600561d.6

MD5 checksum mismatch (corrupted file):

	first time:
	root@node1:/mnt/r2/test# md5sum xx.0602 
	0d6db58b9fb15f063e0893435587eb1c  xx.0602
	root@node1:/mnt/r2/test# md5sum /var/log/glusterfs/xx.0602 
	0d6db58b9fb15f063e0893435587eb1c  /var/log/glusterfs/xx.0602
	root@node1:/mnt/r2/test# 

	second time:
	# Corrupted file on mount point
	root@node1:/mnt/r2/test# md5sum xx.0602 
	af9b0c3dc7c6f28ebc915f0bdf15ac3c xx.0602
	# Original file
	root@node1:/mnt/r2/test# md5sum /var/log/glusterfs/xx.0602 
	0d6db58b9fb15f063e0893435587eb1c/var/log/glusterfs/xx.0602
	root@node1:/mnt/r2/test# 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

File corruption when disk reaches its capacity threshold in a distributed volume with sharding enabled #4686

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

File corruption when disk reaches its capacity threshold in a distributed volume with sharding enabled #4686

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions