Hi,
We ran into some problems with a distributed lock mechanism based on top of flock on top of glusterfs. The basic lock strategy is the usual:
while (true) {
fd = open("...?", O_CREAT|O_RDWR);
flock(fd, FLOCK_EX);
if (fstat on filename && dev_no for fd + filename matches && ino_no for fd + filename matches)
break; /* success */
close(fd);
}
And for unlocking:
unlink(filename);
close(fd);
We've bumped into a few cases where this blatantly fails, so we wrote some code to help with troubleshooting the problem.
It seems everything works just fine as long as we don't issue the unlink() call. I'm not sure where things goes wrong. This is a super basic volume that I'm using:
gluster volume create bench replica 2 192.168.255.255:/mnt/b/{a,b} force
gluster volume start bench
mount -t glusterfs localhost:bench /mnt/m
# gluster volume info bench
Volume Name: bench
Type: Replicate
Volume ID: 705eb5a1-19f9-46a1-aabd-bbde8c12d0b8
Status: Started
Snapshot Count: 0
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: 192.168.255.255:/mnt/b/a
Brick2: 192.168.255.255:/mnt/b/b
Options Reconfigured:
cluster.granular-entry-heal: on
storage.fips-mode-rchecksum: on
transport.address-family: inet
nfs.disable: on
performance.client-io-threads: off
For reference, local ext4 fs:
$ ./flock_bench -d fl -T 10 -t 100 -u
Stats:
Attempts: 5675150 (100.00 %)
Attempts Crosscheck: 5675150 (100.00 %)
Obtained: 674103 ( 11.88 %)
Full Success: 674103 ( 11.88 %)
Open failed: 0 ( 0.00 %)
flock(LOCK_EX) failed: 0 ( 0.00 %)
flock(LOCK_EX) would block: 0 ( 0.00 %)
stat failed lock: 1713814 ( 30.20 %)
stat failed unlock: 0 ( 0.00 %)
fstat failed: 0 ( 0.00 %)
lock failed (wrong file): 3287233 ( 57.92 %)
file changed during lock: 0 ( 0.00 %)
unlink failed: 0 ( 0.00 %)
flock(LOCK_UN) failed: 0 ( 0.00 %)
Via single fuse mount:
$ ./flock_bench -d /mnt/m/ -T 10 -t 100 -u
Stats:
Attempts: 50381 (100.00 %)
Attempts Crosscheck: 50381 (100.00 %)
Obtained: 2468 ( 4.90 %)
Full Success: 2468 ( 4.90 %)
Open failed: 0 ( 0.00 %)
flock(LOCK_EX) failed: 0 ( 0.00 %)
flock(LOCK_EX) would block: 0 ( 0.00 %)
stat failed lock: 1369 ( 2.72 %)
stat failed unlock: 0 ( 0.00 %)
fstat failed: 0 ( 0.00 %)
lock failed (wrong file): 46544 ( 92.38 %)
file changed during lock: 0 ( 0.00 %)
unlink failed: 0 ( 0.00 %)
flock(LOCK_UN) failed: 0 ( 0.00 %)
So that all seems sane up to here (including massive drop in performance, which is acceptable for our use-case at least).
The moment we move to multiple fuse mounts, without unlink():
# ./flock_bench -g localhost:bench -d /tmp/b -T 10 -t 100
-- trim moaning about umount ...
Attempts: 39651 (100.00 %)
Attempts Crosscheck: 39651 (100.00 %)
Obtained: 39651 (100.00 %)
Full Success: 39651 (100.00 %)
Open failed: 0 ( 0.00 %)
flock(LOCK_EX) failed: 0 ( 0.00 %)
flock(LOCK_EX) would block: 0 ( 0.00 %)
stat failed lock: 0 ( 0.00 %)
stat failed unlock: 0 ( 0.00 %)
fstat failed: 0 ( 0.00 %)
lock failed (wrong file): 0 ( 0.00 %)
file changed during lock: 0 ( 0.00 %)
unlink failed: 0 ( 0.00 %)
flock(LOCK_UN) failed: 0 ( 0.00 %)
However, with unlink:
# ./flock_bench -g localhost:bench -d /tmp/b -T 10 -t 100 -u 2>&1 | tee /tmp/multimount_fuse_with_unlink.txt
flock: No such file or directory
flock: No such file or directory
flock: No such file or directory
flock: No such file or directory
flock: No such file or directory
flock: No such file or directory
flock: No such file or directory
flock: No such file or directory
...
unlink(139902779909824/lockfile): No such file or directory
...
unlink(139899491575488/lockfile): No such file or directory
unlink(139901664204480/lockfile): No such file or directory
... stuff about umount again ...
Stats:
Attempts: 171100 (100.00 %)
Attempts Crosscheck: 172458 (100.79 %)
Obtained: 2167 ( 1.27 %)
Full Success: 2167 ( 1.27 %)
Open failed: 0 ( 0.00 %)
flock(LOCK_EX) failed: 168929 ( 98.73 %)
flock(LOCK_EX) would block: 0 ( 0.00 %)
stat failed lock: 3 ( 0.00 %)
stat failed unlock: 0 ( 0.00 %)
fstat failed: 0 ( 0.00 %)
lock failed (wrong file): 1 ( 0.00 %)
file changed during lock: 0 ( 0.00 %)
unlink failed: 1358 ( 0.79 %)
flock(LOCK_UN) failed: 0 ( 0.00 %)
So it seems as long as the lock file never gets unlinked we do okay, so we'll be updating the use-cases we've got to detect the containing filesystem, and if it's glusterfs to skip the unlink, preferring to leak the file rather than running into these problems. Would still be awesome if this can be tracked and fixed.
flock_bench.c
multimount_fuse_with_unlink.txt
Hi,
We ran into some problems with a distributed lock mechanism based on top of flock on top of glusterfs. The basic lock strategy is the usual:
And for unlocking:
We've bumped into a few cases where this blatantly fails, so we wrote some code to help with troubleshooting the problem.
It seems everything works just fine as long as we don't issue the unlink() call. I'm not sure where things goes wrong. This is a super basic volume that I'm using:
For reference, local ext4 fs:
Via single fuse mount:
So that all seems sane up to here (including massive drop in performance, which is acceptable for our use-case at least).
The moment we move to multiple fuse mounts, without unlink():
However, with unlink:
So it seems as long as the lock file never gets unlinked we do okay, so we'll be updating the use-cases we've got to detect the containing filesystem, and if it's glusterfs to skip the unlink, preferring to leak the file rather than running into these problems. Would still be awesome if this can be tracked and fixed.
flock_bench.c
multimount_fuse_with_unlink.txt