Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Binary cache: async push_success #908

Merged

Conversation

autoantwort
Copy link
Contributor

@autoantwort autoantwort commented Feb 15, 2023

This results in ~10-20% faster build times on my machine.

For example building boost on my M1 mac went down from 2.948 min to 2.375 min

@autoantwort autoantwort marked this pull request as draft February 15, 2023 20:04
@autoantwort
Copy link
Contributor Author

How or when should "upload messages" (like Uploaded binaries to {count} {vendor}.) be printed?

@Thomas1664
Copy link
Contributor

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

@autoantwort
Copy link
Contributor Author

Doesn't this have the same problem as #694 that the working thread might exit due to calls to check_exit or value_or_exit?

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

@Thomas1664
Copy link
Contributor

Kind of. In general we need an option to decide if a binary cache failure should be a hard error or only a warning

The problem is that we almost never can be sure that there isn't some nested API call that exits on failure. But it seems like #909 at least partially addresses this issue.

@autoantwort
Copy link
Contributor Author

Yeah but in the binary cache are nearly no hard exists. It currently also only prints warnings.

Copy link
Contributor

@ras0219-msft ras0219-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this direction; unblocking I/O work has great potential for making vcpkg much faster.

However we need to be very careful about the impacts of concurrency -- deadlocks suck :(

src/vcpkg.cpp Outdated
@@ -156,6 +157,7 @@ namespace vcpkg::Checks
// Implements link seam from basic_checks.h
void on_final_cleanup_and_exit()
{
BinaryCache::wait_for_async_complete();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think we can do this here. This is on the critical path for ctrl-c handling and should only be used for extremely fast, emergency tear-down behavior (like restoring the console).

If there happens to be an exit anywhere in any BinaryCache implementation, this would deadlock. Importantly, this include any sort of assertion we might want to do, like checking pointers for null.

Unfortunately, the only path forward I see is to call this (or appropriately scope the BinaryCache itself) at the relevant callers. The consequence of possibly not uploading some set of binary caches in the case of some unhandled program error (such as permissions issue on a directory expected to be writable) is vastly preferable to deadlocks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have changed the BinaryCache::wait_for_async_complete() implementation so it does not deadlock anymore.

I also moved the call to Checks::exit_with_code which is not called when crtl+c is handled. (I personally would like to have a way to terminate vcpkg but wait until the binary cache is done so that I don't lose progress.)

And I prefer it when build packages are uploaded to the binary caches before vcpkg exits because of an error, otherwise I have to build the already build packages again at a later point when there is no cache entry.

Copy link
Contributor

@ras0219-msft ras0219-msft Mar 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed that it is, desirable to finish uploading on "understood" errors. For example, if a package failed to build or failed to be installed.

I was also wrong about my original assessment of a deadlock. My concern was the call path of the binary upload thread calling Checks::unreachable() or .value_or_exit(), but it seems that std::thread::join() does have a carve-out to handle this specific case: it will throw a resource_deadlock_would_occur if you try to join yourself.

I've put some other concerns below, but I don't want those to distract from my main point: We must make it as trivial / correct-by-construction as possible to guarantee that the binary cache thread NEVER attempts to wait on itself. I think the best approach for vcpkg right now is to add calls from Install::perform() etc to BinaryCache::wait_for_async_complete() before any "user-facing" error, such as the exit guarded by result.code != BuildResult::SUCCEEDED && keep_going == KeepGoing::NO. This is motivated by the perspective that it's always safer to terminate than to join and possibly deadlock / race condition / etc.


There's still a UB data race if the main thread and binary upload thread attempt to exit at the same time:

Concurrently calling join() on the same thread object from multiple threads constitutes a data race that results in undefined behavior.
-- https://en.cppreference.com/w/cpp/thread/thread/join

There's also a serious "scalability" problem if we ever want a second background thread for whatever reason, because BGThread A would join on BGThread B, while BGThread B tries to join on BGThread A. This might be solvable with ever more complex structures, such as a thread ownership DAG that gets threads to join only on their direct children, but I don't think the benefit is worth the cost.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The UB and the joining itself could simply be prevented by doing a if (std::this_thread::get_id() == instance->push_thread.get_id()). My concern with the explicit approach is that it is easy to forget to call the waiting function of the BinaryCache and every time you want to exit you have to remember to call it. This seems to me to be very prone to human error.

Copy link
Contributor Author

@autoantwort autoantwort Mar 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have now implemented your request

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ras0219-msft Is there anything left that is preventing this PR from being merged?

@autoantwort autoantwort marked this pull request as ready for review March 5, 2023 20:28
BillyONeal added a commit that referenced this pull request Jan 15, 2025
…utput (#1565)

Extensive overhaul of our downloads handling and console output; @JavierMatosD and I have gone back and forth several times and yet kept introducing unintended bugs in other places, which led me to believe targeted fixes would no longer cut it.

Fixes many longstanding bugs and hopefully makes our console output for this more understandable:
* We no longer print 'error' when an asset cache misses but the authoritative download succeeds. This partially undoes #1541. It is good to print errors immediately when they happen, but if a subsequent authoritative download succeeds we need to not print those errors.
* We now always and consistently print output from x-script s at the time that actually happens. Resolves https://devdiv.visualstudio.com/DevDiv/_workitems/edit/2300063
* We don't tell the user that proxy settings might fix a hash mismatch problem.
* We do tell the user that proxy settings might fix a download from asset cache problem.
* We now always tell the user the full command line we tried when invoking an x-script that fails.
* We don't crash if an x-script doesn't create the file we expect, or creates a file with the wrong hash.
* We now always print what we are doing *before* touching the network, so if we hang the user knows which server is being problematic. Note that this includes storing back to asset caches which we were previously entirely silent about except in case of failure.

Other changes:
* Removed debug output about asset cache configuration. The output was misleading / wrong depending on readwrite settings, and echoing to the user exactly what they said before we've interpreted it is not useful debug output. (Contrast with other `VcpkgPaths` debug output which tend to be paths we have likely changed from something a user said) 

Other notes:
* This makes all dependencies of #908 speak `DiagnosticContext` so it will be easy to audit that the foreground/background thread behavior is correct after this.
* I did test the curl status parsing on old Ubuntu again.

Special thanks to @JavierMatosD for his help in review of the first console output attempts and for blowing the dust off this area in the first place.
…cache-push-success

# Conflicts:
#	include/vcpkg/base/fwd/message_sinks.h
#	include/vcpkg/base/message_sinks.h
#	src/vcpkg/base/message_sinks.cpp
…cache-push-success

# Conflicts:
#	src/vcpkg/commands.install.cpp
#	src/vcpkg/commands.set-installed.cpp
…ture/async-binary-cache-push-success

# Conflicts:
#	include/vcpkg/binarycaching.h
#	src/vcpkg/binarycaching.cpp
… the work queue is drained before returning that no work is left.
* Restore autoantwort's only printing counts when done.
* Note which specs we are submitting in messages from the background.
@BillyONeal BillyONeal marked this pull request as ready for review February 3, 2025 18:34
@BillyONeal
Copy link
Member

@autoantwort I pushed some changes here, can you let me know if you are happy with them? Thanks!

static ExpectedL<BinaryProviders> make_binary_providers(const VcpkgCmdArguments& args, const VcpkgPaths& paths)
void ReadOnlyBinaryCache::fetch(View<InstallPlanAction> actions)
{
std::vector<const InstallPlanAction*> action_ptrs;
Copy link
Member

@BillyONeal BillyONeal Feb 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This block is just moved up from 2325 as these things became members of ReadOnlyBinaryCache or BinaryCache rather than being local to this file now.

});
}

void BinaryCacheSynchronizer::add_submitted() noexcept
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This starts meaningfully new code.

Copy link
Contributor Author

@autoantwort autoantwort left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Comment on lines +217 to +218
using backing_uint_t = std::conditional_t<sizeof(size_t) == 4, uint32_t, uint64_t>;
using counter_uint_t = std::conditional_t<sizeof(size_t) == 4, uint16_t, uint32_t>;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this depend on size_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are a lot of 64 bit machines without 32 bit atomics, and a lot of 32 bit machines without 64 bit atomics, and I wanted to choose something least likely to put us into lockful atomics world.

@BillyONeal BillyONeal merged commit a6289e8 into microsoft:main Feb 5, 2025
6 checks passed
@BillyONeal
Copy link
Member

Thanks for the contribution!

@BillyONeal
Copy link
Member

time results

@Neumann-A
Copy link
Contributor

Why is this so inconsistent? I would have expected less variance in the result. Especially for stuff taking 1d and longer.

@dg0yt
Copy link
Contributor

dg0yt commented Feb 14, 2025

Does the artifact size with static linkage explain most of the inconsistency? In particular when ports install executables.

@Neumann-A
Copy link
Contributor

Does the artifact size with static linkage explain most of the inconsistency? In particular when ports install executables.

Hmm maybe. The android triplets are mor ore less consistent and the -static and -static-md are also more or less consistent. @BillyONeal do you have storage data for the different triplets?

@BillyONeal
Copy link
Member

The difference being mostly a function of how big the binary cache size is is my supposition as well. I don't have those stats though. For instance, the triplets with an LLVM have more improvement. The improvement for macOS seems bigger, which might be explained by not being in the same data center as the caches.

@autoantwort autoantwort deleted the feature/async-binary-cache-push-success branch February 15, 2025 15:14
BillyONeal added a commit to BillyONeal/vcpkg-tool that referenced this pull request Mar 6, 2025
Probably stealth merge conflict in microsoft#908 and/or probably my fault.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants