Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Install files in parallel #1256

Open
wants to merge 40 commits into
base: main
Choose a base branch
from

Conversation

Thomas1664
Copy link
Contributor

@Thomas1664 Thomas1664 commented Oct 29, 2023

Depending on how many files to install, this change results in a >= 4x speed up. I only tested relatively small ports but I expect even higher speed ups for bigger ports.

  1. Get file status in parallel
  2. Write listfile (async)
  3. Create all directories
  4. Copy files in parallel

Comment on lines 205 to 209
#ifdef _WIN32
std::sort(std::execution::par_unseq, output.begin(), output.end());
#else
std::sort(output.begin(), output.end());
#endif
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm open for suggestions on how to implement parallel sorting for all platforms

Comment on lines 120 to 161
fs.remove_all(target, IgnoreErrors{});
fs.remove(target, IgnoreErrors{});
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no difference for regular files between remove_all and remove (provided that vcpkg's implementation is identical to the STL): https://en.cppreference.com/w/cpp/filesystem/remove

@Thomas1664 Thomas1664 marked this pull request as draft October 30, 2023 11:05
@Thomas1664
Copy link
Contributor Author

Thomas1664 commented Oct 30, 2023

@BillyONeal Unit tests on OSX failed here. It seems to work again but I can consistently reproduce the failure on my local machine: The test case captures-output fails with error: calling CreateProcessW failed with 2 (unknown error). I'm not sure if this is the same test that failed in CI. I can confirm that this is most likely not caused by changes in this PR because the changed code inside commands.install.cpp is never hit.

@Thomas1664 Thomas1664 marked this pull request as ready for review October 30, 2023 15:22
@autoantwort
Copy link
Contributor

Btw you can build the format target locally to format all code.

@Thomas1664 Thomas1664 mentioned this pull request Oct 30, 2023
@BillyONeal BillyONeal reopened this Oct 30, 2023
@BillyONeal
Copy link
Member

Sorry!

Copy link
Member

@BillyONeal BillyONeal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm in general happy with attempts to parallelize things but overall I think you need to be much more explicit in proving that everything you call in a parallel context is actually safe to do. Given that the first run of e2e tests experienced a failure and I found races in a cursory review makes me fear that unknown races remain without such a proof.

This means if you want to proceed here I think you need to show strong thread safety requirements of every API you're calling rather than taking large blocks of existing code and wrapping them in parallel constructs.

}
}

fs.create_directories(listfile.parent_path(), VCPKG_LINE_INFO);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This races with other file operations you're doing.

src/vcpkg/commands.install.cpp Outdated Show resolved Hide resolved
src/vcpkg/commands.install.cpp Outdated Show resolved Hide resolved
src/vcpkg/commands.install.cpp Outdated Show resolved Hide resolved
src/vcpkg/commands.install.cpp Outdated Show resolved Hide resolved
src/vcpkg/commands.install.cpp Outdated Show resolved Hide resolved
if (ec)
{
msg::println_error(msgInstallFailed, msg::path = target, msg::error_msg = ec.message());
Debug::println("Install from packages to installed: Fallback to copy "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Debug::println is not thread safe.

Copy link
Contributor Author

@Thomas1664 Thomas1664 Nov 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is other parallelized code where this also happens: in cmd_execute_and_capture_output_parallel we call cmd_execute_and_stream_data which calls Debug::print().

src/vcpkg/commands.install.cpp Outdated Show resolved Hide resolved
@Thomas1664
Copy link
Contributor Author

Thomas1664 commented Nov 28, 2023

I'm in general happy with attempts to parallelize things but overall I think you need to be much more explicit in proving that everything you call in a parallel context is actually safe to do. Given that the first run of e2e tests experienced a failure and I found races in a cursory review makes me fear that unknown races remain without such a proof.

I think the main issue with parallelizing here is printing. Basically, whenever the program can print anything, needs to be guarded by a mutex which is IMO not possible from the outside. Therefore, we need to make printing inherently thread safe.

@Thomas1664
Copy link
Contributor Author

@BillyONeal now that #1323 landed, what needs to be done for this to land?

I checked each function for thread safety. The following functions are not thread safe because they print to the console:

  • fs.create_directories
  • fs.write_lines
  • Obvious calls to print functions

I'm not entirely sure how to use DiagnosticContext here...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants