-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Hello,
We are currently facing a problem where our GlusterFS volume is reporting an incorrect total size on the gateway node. Instead of reflecting the full cluster usable capacity of ~1.2 PB, the mounted volume is showing only ~41 TB, which corresponds to the usable size of a single data node.
Below is the detailed overview of our environment, Gluster version, and problem timeline for your reference.
Cluster Architecture:
We have a 15-node GlusterFS cluster, with each node populated with 36 hard drives.
Individual disk capacity: 4 TB
Usable capacity per disk: ~3.64 TB
Total brick count: 540 bricks
Gluster version: 11.1
Volume Configuration:
The storage is configured as a Disperse Volume with the following layout:
Disperse data: 11
Redundancy: 4
Bricks per disperse set: 15 (11+4)
Transport: TCP
The overall populated usable storage is approximately 1.2 PB.
Problem Timeline:
- For over 3 months, the mounted volume on the gateway correctly showed ~1.2 PB usable capacity. Last week, one data node went down.
- Following this, the Gluster volume was stopped.
- The volume was later mounted using only 14 nodes while the failed node was still down.
- From that point onward, the gateway started showing only ~41 TB, which matches the usable capacity of a single node.
- The failed data node has now been fully restored and is back in service; however, the issue persists, and the mounted volume continues to show only ~41 TB instead of the expected full cluster capacity.
Solutions Tried:
- Stopped the volume, unmounted the storage, and restarted the glusterd service on all data nodes; however, the issue persisted.
- Stopped the volume, unmounted the storage, mounted all bricks, and restarted the glusterd service on all nodes; even after this, the volume still reports only ~41 TB.
We request your assistance in investigating why the volume is still limited to single-node capacity even after the failed node was restored.
Whether the 14-node mount during the downtime affected the disperse volume metadata? And the correct procedure to safely restore full volume visibility without risking data integrity
Regards
SRP