-
Notifications
You must be signed in to change notification settings - Fork 281
Limit concurrent HTTP requests #922
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@facebook-github-bot has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. (Because this pull request was imported automatically, there will not be any future comments.) |
cc @cormacrelf |
d9909c9
to
57a8fac
Compare
Reworked this to reduce the boilerplate and ensure the semaphore permit is held until the response is fully consumed. Hopefully that doesn't confuse the import? |
03cf79a
to
fcba6c4
Compare
fcba6c4
to
9786a5c
Compare
.inspect(move |_| { | ||
// Ensure we keep a concurrent request permit alive until the stream is consumed | ||
let _guard = &semaphore_guard; | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a little cheeky and might be clearer as a stream transformer struct, but that would probably take about 4x as much code. Let me know if that'd be preferred.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need this in the stream itself? Could we have this in line 142 let _guard = &semaphore_guard;
. As far as I can tell the request already occurred by then
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These requests are typically downloading potentially large files. We don't want to release semaphore credit until the response has been completed, which is after the stream has been consumed in full and dropped. Without stream ownership of the guard, it would be released as soon as the response begins, and we could still end up with arbitrarily large numbers of concurrent [attempted] requests, which is what this PR seeks to avoid.
Another thought: the motivating case for this PR is downloading from crates.io. In theory, all such requests could be pipelined through a single HTTP/2 connection, which would avoid the resource exhaustion hazard and probably be more efficient for everyone involved. Getting that to happen seems like a significantly more subtle task, however, so maybe best left for follow-up work. |
As described in #316 (comment) I think there's a more direct solution available by tweaking hyper's HTTP client to reduce the initial default quantity of multiplexing. I'll pursue that in a separate PR. Leaving this open for now in case there's independent interest in limiting the number of connections, though as currently written this limits the number of requests, which isn't quite right. |
A few issues:
All things considered, we should have that http2 max concurrent streams change, but for file descriptor limits, the semaphore here is the only thing that will work. |
Thanks for the feedback!
I'd be surprised, but I'll verify.
Yes, this PR is still necessary for that case.
This seems like a pretty easy fix, though? I'll draft a separate PR. |
Opened #991 to delay file creation. I think we need that to get the most benefit from this PR anyway, since the semaphore acquire happens when the HTTP request is attempted, after the file has already been opened otherwise. |
hyper maintainers confirm that this is not the case. |
Summary: Helps reduce the number of file descriptors required for a build, especially in combination with concurrent efforts to limit the number of concurrent HTTP responses being processed. See also #922. Pull Request resolved: #991 Reviewed By: IanChilds Differential Revision: D75920121 fbshipit-source-id: ca17981c2305054c8fc596b165201479d731ee15
When building a non-trivial project with third-party dependencies fetched via
http_archive
or similar, e.g. from reindeer in non-vendored mode, buck2 can generate extremely large numbers of outgoing HTTP requests. Hyper does not enforce any limits itself, happily expanding its connection pool with every additional concurrent request. This can cause the remote HTTP server to reject requests, and even lead to buck2 itself failing with "too many open files" errors.Larger numbers of concurrent requests have rapidly diminishing returns, so while there's no obviously correct limit, any smallish number should improve behavior in most cases. It may even improve total throughput by reducing contention for network bandwidth.
See also discussion in facebookincubator/reindeer#46.