-
Notifications
You must be signed in to change notification settings - Fork 4.2k
Add --experimental_check_external_other_files option #25957
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@lberki Can you take a look? |
The change itself looks reasonable, although it's lacking a test case. Mind adding one? But before that: In what scenario is this problematic? As far as I can tell, this can only make a difference when the files underneath actually changed, and even then, Bazel would eventually checksum them and realize that they haven't. I think it's very reasonable not to check the files under the install base (after all, they are already checked for changes by the Bazel launcher as you point it out), but if e.g. someone symlinks a file from |
Also adding @Wyverald to the review thread since it's been a long time I looked at external repositories. |
not at all; i'll look into it.
If, as you suggested some months ago, one uses CRIU for freezing bazel, and then revives the server on a new machine, the files have new identities. In particular the ctime and inode number are not under control of Docker/CRI-O/Podman, so they always change. This has no effect on correctness (the files are unchanged and hash to the same checksum), but it causes some invalidations to occur. In our test repo (30k source files, 20k actions), this makes a null build go from 0.5s to 30s. You can try this out for yourself, by hacking the launcher to ignore mtime changes, and then touching embedded_tools on a live server. Maybe these bogus invalidations should be handled more cleverly (are they hashed with sha256 as well, or handled differently?), but diagnosing this goes beyond my Bazel-fu. I tried reading the commits that introduced this and couldn't make sense of how this works, and why the check happens in an
if you can give me hint where these checks are scheduled, I can send a separate PR to disable it. Then we can change the "non-output-external-file" name to a more descriptive "host-file".
I agree. |
If possible, I'd like @haxorz to take a look too, as he's reviewed some previous changes in this area. I admit I understand all the external/external-repo file checking logic very poorly, despite having worked on external dependencies for a few years. Just regarding the PR itself, it seems okay to me to get it submitted since it's guarded behind a flag and all. I'd love to understand this part of the code better. |
Sorry but I probably won't have time to do a deep review for a few weeks. |
@hanwen-flow any ideas as to how Bazel should tell the "CRIU snapshot got revived, now everything is different" case apart from the " If there is no way to do so, the command line flag would only turn one kind of badness (slow startup) into another kind (incorrectness) and Bazel's ethos is that we'd rather be correct than fast if both at the same time is completely impossible.. |
Bazel can't; that is exactly why we need the flag, because we can tell the cases apart and tell bazel what is going on. We want to use CRIU to provide 'warm bazel' CI infrastructure. In that case, we know that we are reviving a CRIU snapshot, so we know that all the file changes are bogus. We also know that the file system is not changed, because in a hermetic CI run, we only change the source under test and leave everything else constant. I would not recommend using this flag for interactive use (ie. where a user is able to change
How does this reasoning work for the other flags? ( |
Wouldn't this change just postpone the cost to the second Bazel invocation? (presumably you'd only pass this flag to the first invocation after the CRIU restore?)
It doesn't, but surely you understand why I'm trying to minimize the number of such holes. Now that you mention it, wouldn't |
In case of CI, there is no second invocation. Once we get the test results out to github/buildkite/etc., the container is terminated, and Bazel passes away, blissfully unaware of the cruel, chaotic world around it. :-)
No, it's a separate code path. See bazel/src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java Line 3735 in 25d6609
if that goes with the else .
|
Let me see if I understand this correctly:
From this, it looks like what you practically want is |
I want to disable all file dirtiness checks based on (ctime + inode number); your suggestion here also works for me. Let me rephrase how I understand your explanation to make sure we are on the same page: This piece of code does a two-step check, where the first (I suppose the checks for embedded_tools paths are just generated as bazel derefs symlinks in the toolchain setup, which for Java defaults uses the embedded tools by default, and for C++ uses /usr/bin/gcc and friends. There is probably no separate call to schedule dirtiness checks for embedded_tools as a whole) The name "non-output external file" remains a mystery to me. |
At least I know the answer to the mystery of "non-output external file": it's files that are not under a So you want to disable all inode/ctime-based checks everywhere, including in the source tree(s), external repositories and what you call "host files"? But then what would Bazel do? Are you planning to change the top-level targets or the configuration flags? (if not, nothing changes and then it would be a null build) Or did I misunderstand you and you still want the checks in the source tree to be done? (but then how come those would retain their inode number / ctime?) I stared at that code a bit, including with debugger and it looks like my initial assessment was wrong: the first |
IIUC, inotify ( |
Yeah, (Have you seen my above question as to what you expect Bazel to do after you disable all ctime/inode based invalidation? My understanding is that as long as you don't change the command line, the build will be a no-op) |
In case of CI, fetching and checking out the git commit under test will generate a write under the source tree, which would trigger the inotify based diff awareness, leading to a partial invalidation. If there is no write in the source tree, then obviously the build should be a null build. |
Okay, so your game plan is:
Did I get this right? |
Almost: if watchfs generates a reliable stream of diffs, I don't need to disable ctime/ino invalidation for source files. |
If you can't rely on watchfs, you are in trouble anyway, because then you can't tell source files that really changed and those whose ctime/inode number changed due to CRIU and then you only have the choice between "all source files changed" and "no-op build". Given this and the semantics of the existing flags, it looks like I propose that you add a test case then I review it? (cc @meteorcloudy and @Wyverald in case they disagree with this plan) |
not all hope is lost. Containers can control mtime, so the updated mtime timestamps are an indication that something changed, but I agree we'd need further hacks to make bazel work with a container-appropriate idea of file identity. |
dc0b468
to
1e2a8ea
Compare
PTAL. Added a test. I've added a separate commit that more clearly describes what these files are. Hope it's OK to change the ordering of the enum. I think it would be good to improve the naming, though, because by adding a command-line option, we are setting the name in stone. Current name: "non output external files". Some ideas:
WDYT ? |
@Wyverald is looking at the relevant code recently, please wait for his review. |
src/main/java/com/google/devtools/build/lib/skyframe/SkyframeExecutor.java
Outdated
Show resolved
Hide resolved
This part seems incorrect. The difference between the two branches is simply whether we try to check dirtiness of all known values ( I'm still reading through this code (and environs). Note to any interested parties -- the previous PR that introduced (That's not to say that this specific PR is gated on me figuring everything out -- I'm still happy to get it submitted, provided that we recognize it's an experimental flag and could go away with little notice.) |
that is interesting; that means that my comment is misleading. In the case of a write to the repo contents cache (eg. dependency is upgraded through a normal bzlmod update), would the update be noticed correctly? Or do the mechanics depend on a write to the cache triggering an update to EXTERNAL files, which causes further invalidation? That would be concerning. |
Which comment are you referring to?
The repo contents cache as currently designed/implemented is "append-only", so there shouldn't be any need to worry here (IIUC, this PR only changes the behavior of detecting file changes, not additions). For a little bit more context: when you upgrade a |
arguably, the repo cache is bazel managed, so the above is a bit imprecise.
I hadn't realized that addition are detected through a different path, but now that you say it, it makes sense. |
I was about to say "looks good to me, merge this", but then I read up on @Wyverald 's comments about the external repository cache. I wouldn't feel too bad about renaming an experimental option (that liberty is exactly why it's experimental), but we should nevertheless make an effort to come up with a good nomenclature. How about calling these "foreign" files? "external" is way too overloaded. That word still conflicts with @Wyverald what do you mean by "not detecting additions"? My understanding is that Bazel still lists directories and dirties Skyframe if it detects an additional file. Since this whole flag is about telling Bazel "trust me, you don't need to look there", the usual "correctness first" ethos doesn't apply here. And from my understanding of @hanwen-flow 's use case, it looks like all mtimes and inode numbers can change, including those of the true repository cache, so he presumably would want to turn this check off for the true repository cache, too. So I propose that we rename @Wyverald I agree with your "tangled web of historical accidents" assessment :( |
"foreign" is excellent. yes, we disable ctime/inode checks for the repository cache too.
I meant: in our scenario, Bazel cannot get confused for additions, because there is no previous ino/ctime to compare to. |
I actually have a slight preference for keeping the name "external", not in the least because everything around it is called "external" (eg. That's a long way of saying I don't have a great idea :P but I'm still happy to get this merged, pending some code comment fixes as you noted. Just waiting for Lukács's approval now. |
works for me. |
Clarify what these files are, and rename nonOutputExternal to ExternalOther throughout. RELNOTES: n/a
I opened #26121 for the rename , which should not be squashed with the functional change. |
This option disables the code path that checks EXTERNAL_OTHER files. These are files outside of roots that Bazel tracks; they occur when outputs/sources/repos symlink files from outside directories. This is useful for process migration scenarios using containers. Here ctime and inode numbers change from under the bazel server without file content changes. Addresses bazelbuild#25954. RELNOTES: File change checks for non-output, non-repo external files can now be disabled with the `--experimental_check_external_other_files` flag.
I'm fine either way. I'll approve this change, but please either wait until @meteorcloudy has a chance to look at it (or else signal that you're impatient and then I'll ping him :) ) |
...I mean until @Wyverald has a chance to look at it. |
I'll handle the import. |
This option disables the code path that checks EXTERNAL_OTHER files.
These are files outside of roots that Bazel tracks; they occur when
outputs/sources/repos symlink files from outside directories.
This is useful for process migration scenarios using containers. Here
ctime and inode numbers change from under the bazel server without
file content changes.
Addresses #25954.
RELNOTES: File change checks for non-output, non-repo external files
can now be disabled with the
--experimental_check_external_other_files
flag.