-
Notifications
You must be signed in to change notification settings - Fork 669
Fix NativeWatcher
not ingesting hardlinked/cloned directories (Bun/pnpm installs) due to omitted events
#1559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Fix NativeWatcher
not ingesting hardlinked/cloned directories (Bun/pnpm installs) due to omitted events
#1559
Conversation
97ac278
to
c82b33d
Compare
c82b33d
to
c843fa7
Compare
… are emitted We can do a bit better with preventing excessive events by batching all incoming change events in a macro tick. In each tick, we'll process the events, and filter them as usual, but also mark hardlink candidates, as before. However, we now omit hardlink candidates, if any sub-files/directories are in the batch, which indicates usually that `fs.watch` will deliver children and the candidate is not a clone/hardlink/copy-on-write directory. We then also batch the outputs and process all pending ones in the same tick. Due to the async/await logic, we usually won't process inputs and outputs in the same tick.
I've pushed a new implementation that attempts to avoid duplication by adding an input/output batch. This now skips hardlink directory candidates from being crawled, if the input batch contains any subpaths. This seems to behave much more predictably and the integration tests are passing with this change |
NativeWatcher
not ingesting hardlinked directories due to omitted eventsNativeWatcher
not ingesting hardlinked/cloned directories (Bun/pnpm installs) due to omitted events
This looks awesome @kitten - thank you!
Do you happen to know if this is - We could do with some integration tests using hard links with the different backends but you can leave that to me - thanks again for the PR and I'll make some time for a detailed review and testing this week. |
It's b). Specifically, for some syscalls it seems like |
// for manual crawling if there's no other event that has a child path in this directory. | ||
// If we have a child path, then we can rely on `fs.watch` reporting changes for | ||
// this directory's files | ||
if (type === 'd' && isMaybeHardlinked(stat)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you find this code path is hit with Bun package manager? I'm trying to write some integration tests for it at the moment and having a bit of trouble reproducing a directory with nlink > 1.
AFAICT, pnpm does not hard link directories as such, rather it recursively hard links file contents.
It looks like Bun uses the clonefile()
(copy on write) C sys call, but according to the man page for that, it also just clones each file:
If src names a directory, the directory hierarchy is cloned as if each item
was cloned individually.
It does seem like it's possible to hard link a directory but it's pretty buried, has specific constraints, and creates hard links that some shell commands don't understand: https://jameshunt.us/writings/linking-dirs-in-macos/ - is that was this needs to support?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've noticed the pnpm
difference, yes. bun
seems to hit some kind of special case sometimes where it's using a single syscall to "clone" that target directory, which triggers the issue pretty reliably. What made me a bit suspicious is this comparison of syscall counts: https://x.com/lydiahallie/status/1955671845384142963
In my testing (with Bun >=1.2.19), it reproduced reliably. Could be filesystem specific? Maybe it only does that special clone operation on APFS?
(Edit: Also probably worth mentioning that I tested this against an expo-*
package, so I knew it was copied to multiple projects that I've got checked out)
What my suspicion was that APFS (like some other Linux file-systems) supports a copy-on-write clone operation that can operate on an entire directory, so like a hardlink, but without having to recursively hardlink files-only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, after another look it seems to me like the nlink
for a directory on APFS has little to do with hard links - it's simply the number of entries inside the directory (+2 for .
and ..
).
So nlink > 1
should always hold on APFS, and the logic in this PR boils down to "if we see a directory event and no child events within a tick, crawl the directory" - which is probably fine in general. But if the directory isn't in fact a hard link, but a recursive creation of file hardlinks/reflinks, might the crawl race with the creation of the contents?
I'm curious about what it is fs.watch
is doing differently vs Watchman and/or fsevents
. And regardless of whether we work around this in Metro - is it a Node.js bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah haha, well that explains that. I must've gotten confused with nlink
on file vs folders. Yes, I did see race conditions. This generally leads to a double-report which wasn't that bad but did break tests here and there.
Adding the tick batching worked around this. Node tries to report changes in batches, so generally they're all expected to mostly arrive in one tick. That's not the case for Windows, which is why on that experimental branch for Windows that branches off of this one, the tick length is increased with a timeout.
I think whether it's a Node bug or not boils down whether they've implemented logic like this or not. I'm assuming that the events they're receiving have the same limitations we see here, and the question is whether they ever perform crawls or intend to, to guarantee all recursive files to always have individual events
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like a section in this blog post calls out the exact syscall for macOS explicitly: https://bun.com/blog/behind-the-scenes-of-bun-install#macos (clonefile
or clonefile_each_dir
)
Summary
Hardlinking/cloning is increasingly common in package managers, when a file system allows for efficient copy-on-write hardlinks/clones. Bun and pnpm use this for dependency installations.
When installing a new package, package managers may create a hardlink of a package in
node_modules
(or an isolated dependencies sub-folder) that is a clone of a Node module directory in their cache folders. When this happens, since a copy-on-write clone operation may have occurred, only a single file-system event is emitted for the entire directory. This reproduces consistently on APFS.The
NativeWatcher
does not recognise this and won't emit events for the cloned directory's contents, which means that the change events aren't successfully updating theFileMap
.Instead, we can:
Changelog: [Fix] Fix added directories not being detected properly by the
NativeWatcher
when they're cloned using a hardlink/copy-on-write operation (typically, as Bun or pnpm install new packages)Test plan
expo-image
)DEBUG=Metro:NativeWatcher
to observe watcher eventsbun add expo-image
, or pnpm,pnpm add expo-image
I've tested that removing the package again or modifying it works as expected. I didn't observe duplicate events, but I believe the parent classes filter out duplicates, if any occur.