-
Notifications
You must be signed in to change notification settings - Fork 300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Binary cache: async push_success #908
Binary cache: async push_success #908
Conversation
How or when should "upload messages" (like |
Doesn't this have the same problem as #694 that the working thread might exit due to calls to |
Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning |
The problem is that we almost never can be sure that there isn't some nested API call that exits on failure. But it seems like #909 at least partially addresses this issue. |
Yeah but in the binary cache are nearly no hard exists. It currently also only prints warnings. |
# Conflicts: # src/vcpkg.cpp
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this direction; unblocking I/O work has great potential for making vcpkg much faster.
However we need to be very careful about the impacts of concurrency -- deadlocks suck :(
src/vcpkg.cpp
Outdated
@@ -156,6 +157,7 @@ namespace vcpkg::Checks | |||
// Implements link seam from basic_checks.h | |||
void on_final_cleanup_and_exit() | |||
{ | |||
BinaryCache::wait_for_async_complete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not think we can do this here. This is on the critical path for ctrl-c handling and should only be used for extremely fast, emergency tear-down behavior (like restoring the console).
If there happens to be an exit anywhere in any BinaryCache implementation, this would deadlock. Importantly, this include any sort of assertion we might want to do, like checking pointers for null.
Unfortunately, the only path forward I see is to call this (or appropriately scope the BinaryCache
itself) at the relevant callers. The consequence of possibly not uploading some set of binary caches in the case of some unhandled program error (such as permissions issue on a directory expected to be writable) is vastly preferable to deadlocks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have changed the BinaryCache::wait_for_async_complete()
implementation so it does not deadlock anymore.
I also moved the call to Checks::exit_with_code
which is not called when crtl+c is handled. (I personally would like to have a way to terminate vcpkg but wait until the binary cache is done so that I don't lose progress.)
And I prefer it when build packages are uploaded to the binary caches before vcpkg exits because of an error, otherwise I have to build the already build packages again at a later point when there is no cache entry.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed that it is, desirable to finish uploading on "understood" errors. For example, if a package failed to build or failed to be installed.
I was also wrong about my original assessment of a deadlock. My concern was the call path of the binary upload thread calling Checks::unreachable()
or .value_or_exit()
, but it seems that std::thread::join()
does have a carve-out to handle this specific case: it will throw a resource_deadlock_would_occur
if you try to join yourself.
I've put some other concerns below, but I don't want those to distract from my main point: We must make it as trivial / correct-by-construction as possible to guarantee that the binary cache thread NEVER attempts to wait on itself. I think the best approach for vcpkg right now is to add calls from Install::perform()
etc to BinaryCache::wait_for_async_complete()
before any "user-facing" error, such as the exit guarded by result.code != BuildResult::SUCCEEDED && keep_going == KeepGoing::NO
. This is motivated by the perspective that it's always safer to terminate than to join and possibly deadlock / race condition / etc.
There's still a UB data race if the main thread and binary upload thread attempt to exit at the same time:
Concurrently calling join() on the same thread object from multiple threads constitutes a data race that results in undefined behavior.
-- https://en.cppreference.com/w/cpp/thread/thread/join
There's also a serious "scalability" problem if we ever want a second background thread for whatever reason, because BGThread A would join on BGThread B, while BGThread B tries to join on BGThread A. This might be solvable with ever more complex structures, such as a thread ownership DAG that gets threads to join only on their direct children, but I don't think the benefit is worth the cost.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The UB and the joining itself could simply be prevented by doing a if (std::this_thread::get_id() == instance->push_thread.get_id())
. My concern with the explicit approach is that it is easy to forget to call the waiting function of the BinaryCache and every time you want to exit you have to remember to call it. This seems to me to be very prone to human error.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have now implemented your request
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ras0219-msft Is there anything left that is preventing this PR from being merged?
Co-authored-by: Robert Schumacher <[email protected]>
Co-authored-by: Robert Schumacher <[email protected]>
…ages between package installs Co-authored-by: Robert Schumacher <[email protected]>
# Conflicts: # src/vcpkg/build.cpp
# Conflicts: # src/vcpkg/base/messages.cpp
# Conflicts: # include/vcpkg/base/messages.h # src/vcpkg/base/messages.cpp
…utput (#1565) Extensive overhaul of our downloads handling and console output; @JavierMatosD and I have gone back and forth several times and yet kept introducing unintended bugs in other places, which led me to believe targeted fixes would no longer cut it. Fixes many longstanding bugs and hopefully makes our console output for this more understandable: * We no longer print 'error' when an asset cache misses but the authoritative download succeeds. This partially undoes #1541. It is good to print errors immediately when they happen, but if a subsequent authoritative download succeeds we need to not print those errors. * We now always and consistently print output from x-script s at the time that actually happens. Resolves https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2300063 * We don't tell the user that proxy settings might fix a hash mismatch problem. * We do tell the user that proxy settings might fix a download from asset cache problem. * We now always tell the user the full command line we tried when invoking an x-script that fails. * We don't crash if an x-script doesn't create the file we expect, or creates a file with the wrong hash. * We now always print what we are doing *before* touching the network, so if we hang the user knows which server is being problematic. Note that this includes storing back to asset caches which we were previously entirely silent about except in case of failure. Other changes: * Removed debug output about asset cache configuration. The output was misleading / wrong depending on readwrite settings, and echoing to the user exactly what they said before we've interpreted it is not useful debug output. (Contrast with other `VcpkgPaths` debug output which tend to be paths we have likely changed from something a user said) Other notes: * This makes all dependencies of #908 speak `DiagnosticContext` so it will be easy to audit that the foreground/background thread behavior is correct after this. * I did test the curl status parsing on old Ubuntu again. Special thanks to @JavierMatosD for his help in review of the first console output attempts and for blowing the dust off this area in the first place.
…cache-push-success # Conflicts: # include/vcpkg/base/fwd/message_sinks.h # include/vcpkg/base/message_sinks.h # src/vcpkg/base/message_sinks.cpp
…cache-push-success # Conflicts: # src/vcpkg/commands.install.cpp # src/vcpkg/commands.set-installed.cpp
… background thread.
…ture/async-binary-cache-push-success # Conflicts: # include/vcpkg/binarycaching.h # src/vcpkg/binarycaching.cpp
…tion from the background thread.
… the work queue is drained before returning that no work is left.
* Restore autoantwort's only printing counts when done. * Note which specs we are submitting in messages from the background.
@autoantwort I pushed some changes here, can you let me know if you are happy with them? Thanks! |
static ExpectedL<BinaryProviders> make_binary_providers(const VcpkgCmdArguments& args, const VcpkgPaths& paths) | ||
void ReadOnlyBinaryCache::fetch(View<InstallPlanAction> actions) | ||
{ | ||
std::vector<const InstallPlanAction*> action_ptrs; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This block is just moved up from 2325 as these things became members of ReadOnlyBinaryCache or BinaryCache rather than being local to this file now.
}); | ||
} | ||
|
||
void BinaryCacheSynchronizer::add_submitted() noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This starts meaningfully new code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
using backing_uint_t = std::conditional_t<sizeof(size_t) == 4, uint32_t, uint64_t>; | ||
using counter_uint_t = std::conditional_t<sizeof(size_t) == 4, uint16_t, uint32_t>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this depend on size_t?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are a lot of 64 bit machines without 32 bit atomics, and a lot of 32 bit machines without 64 bit atomics, and I wanted to choose something least likely to put us into lockful atomics world.
Thanks for the contribution! |
Why is this so inconsistent? I would have expected less variance in the result. Especially for stuff taking 1d and longer. |
Does the artifact size with static linkage explain most of the inconsistency? In particular when ports install executables. |
Hmm maybe. The android triplets are mor ore less consistent and the -static and -static-md are also more or less consistent. @BillyONeal do you have storage data for the different triplets? |
The difference being mostly a function of how big the binary cache size is is my supposition as well. I don't have those stats though. For instance, the triplets with an LLVM have more improvement. The improvement for macOS seems bigger, which might be explained by not being in the same data center as the caches. |
Probably stealth merge conflict in microsoft#908 and/or probably my fault.
This results in ~10-20% faster build times on my machine.
For example building boost on my M1 mac went down from 2.948 min to 2.375 min