-
-
Notifications
You must be signed in to change notification settings - Fork 731
Description
We are using a polling observer because our monitored directories are on a file share. We discovered a problem that is easily reproducible. Occasionally when we had a file deleted in one part of the file tree and very shortly thereafter a completely unrelated file was created in another branch of the tree, Watchdog was incorrectly reporting this as a move event from the deleted file to the created file. Digging into the source code it became clear that the DirectlrySnapShotDiff was assuming that if the inodes were the same the file was the same. This is not the case as inodes can and will be recycled. A simple change to check the file modification times (in addition to the inode) would fix this, if the file was truly moved the modification times would be the same, if delete/create they would be different.
Looking at the history it looks like Watchdog used to do this, but a code refactor 2/14 seems to have removed this file time check. There was also a test that looked to handle the case of a file being modified THEN moved in one cycle of the snapshot comparison. I believe this refactor tried to handle this latter case, breaking the case we saw. I propose that it is impossible to detect the modify/move use case, consider the following:
Scenario 1:
First snapshot:
dir1/a (inode 1234 mtime 01:02:03)
dir2/
sleep 1
touch dir1/a
mv dir1/a dir2/b
Second snapshot:
dir1/
dir2/b (inode 1234 mtime 01:02:04)
Scenario 2:
First snapshot:
dir1/a (inode 1234 mtime 01:02:03)
dir2/
sleep 1
rm dir1/a (inode 1234 released)
touch dir2/b (inode 1234 assigned)
Second snapshot:
dir1/
dir2/b (inode 1234 mtime 01:02:04)
Notice that the comparison of snapshots in both scenarios is IDENTICAL. I cannot see a way to differentiate between these two cases in the face of inode recycling. Given this I suggest that is better to report this case as a delete/create than as a move. It is a "lesser of two evils" to report a move as a delete/create than reporting a delete/create as a move.
Note that this also means it is impossible to report the modification of "a" in Scenario 1. There is not enough history available to determine that for certain, and if it truly was modified THEN immediately deleted simply reporting the delete and losing the modify event is better than making an assumption.
I was easily able to reproduce the inode recyling issue with a simple script like this:
rm foo.dat
touch foo.dat
This quite frequently resulted in a moved event when the inode was recycled. Only occasionally was it correct when the inodes were different. I was running on CentOS 5.10 (yes, I know this is ancient but it is a customer requirement that we deploy on this stone age OS).
In addition, I had to change the tests "wait" method. It assumed that all Linux platforms had nanosecond precision on m_time. My CentOS 5.10 was reporting only one second precision (a more recent Fedora box had nanosecond however). Not sure what's up with that, but I also changed the wait so the tests would work.
Maybe someone has a better way of handling this, but we cannot use Watchdog the way it is. We are going to build off my fork with these changes until this is fixed which can be found here:
https://github.com/jrichards99/watchdog.git
I will issue a pull request, but I realize my patch may not cover enough for general consumption.