-
Notifications
You must be signed in to change notification settings - Fork 1.7k
httpd: add timeout on requests #3148
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
This patch adds to http_server a set_request_timeout() method which enables a time limit on reading a request from a connection. When this timeout is reached before a request is read, the connection is closed. The primary goal for this feature is to clean up persistent connections (HTTP keep-alive) from clients that have died, or were "forgotten" by a proxy or load balancer which created them. Such dead connections can accumulate over time and consume server resources, and the new request timeout feature will remove them. The request timeout can drop not only idle connections, but also connections where a client died in the middle of sending a request, and will never continue. This situation isn't common, but again if we don't clean up these connections, they can accumulate over time. Finally, the request timeout can also help defend against slowloris DoS attacks - where an attacker deliberately sends a request extremely slowly. The request timeout can drop these connections even though they are not completely idle. The set_keepalive_parameters() feature added recently can also solve some of these problems, but not all of them, and needs to work harder to verify a connection is really dead - whereas set_request_timeout() just assumes that a connection can be closed immediately if a request wasn't received for a long time. The request timeout continues to default to 0, meaning it is disabled, so this patch doesn't change the http server's behavior for applications unless they choose to use the new feature. Fixes scylladb#3130 Signed-off-by: Nadav Har'El <nyh@scylladb.com>
|
Pushed a new version with a fourth test with two more verifications (as usual, explained in comments). Dear reviewers, if you're curious why I decided to put the timeout on receiving whole requests and not, for example, on a single read making progress, well - it started with an implementation problem: The problem was that the code uses consume() which takes care of the whole read loop for a single request and doesn't let me do things like reset the timer between each read. But once I made it into a timeout on the whole request I realized this is actually "good enough" for most "modern" applications, where you don't expect extremely slow 100 baud connections or extremely long request, so a (for example) 10 second timeout on receiving a request is not a problem. But beyond being good enough, it also has a benefit: It protects against attackers deliberately trickling long requests very slowly to consume a lot of server memory with little client effort ("slowloris" attack). |
|
@StephanDollberg @travisdowns this is a feature I propose can be used instead of TCP keepalive to clean up dead httpd connections. |
|
Yeah I think this is another layer which is good to have.
Getting a complete timeout solution can actually be quite tricky. Having one total timeout can be good for slowloris but if you have an http/file server which serves bigger files you need to account for slower clients as well which might only be downloading with XXX KB/s. With only one total timeout they'll in all likely hood always run into the timeout. Inter-read timeouts can also be advantagous as they allow to keep idle connections open for longer which is good for keep-alive (which is always at odds with timeouts). For full slowloris prevention you'd also need send timeouts as people could just receive very slowly on their and hence keep you busy for longer. But anyway none of that makes the simple approach good to have in combination with TCP keepalives. |
|
I think there's possibly two different timeouts here. One is for idle connections for which not a single byte was received. Another is for a connection which started receiving data, but is receiving it slowly. Should we have two different timeouts here? Should we reset the timeout any time a byte is received? |
This is why I clearly called the new feature "request_timeout" and applications where the total request time can be high - like when you have 100-baud clients or when requests can include large uploads - will not be able to use this new feature. I also wrote such a warning in the comment. If you want a timeout on individual reads (not the complete request), it's actually not very trickyl - you can do exactly what I did but just restart the timer before every single read from the socket - not in the beginning of the request like I did. The only tricky part is that our current implementation uses a utility function
What is true that if the timeout is 30 seconds and you just spent 29.9 seconds with the connection being idle (keep-alive), now you only have 0.1 seconds to completely receive the request, and may close the connection in the middle. Of course, the chance that this will happen is unlikely, unless you badly tune the timeout to your workload. For example, imagine that you know that your workload often creates a connection and has a loop which sends a request every exactly 10 seconds - you wouldn't want to use query_timeout for exactly 10 seconds, this would be bad.
This is, unfortunately, true. With my patch, the attacker will not be able to force you to hold a very large (and very slowly growing) request in memory, but it might be able (depending on what the server is actually doing and whether it needs to generate large responses in memory) be able to force you to generate a very large response in memory and then read it very slowly.
It makes the "simple approach" (I guess you mean my patch?) good to have, or not good to have? I couldn't get to the bottom of this sentence :-) |
| } | ||
| _resp.reset(); | ||
| if (_server._request_timeout.count() > 0) { | ||
| _request_timeout_timer.cancel(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should you not cancel unconditionally? In case the timeout was changed.
I guess it doesn't matter - setting the request timeout doesn't have to synchronize with running requests.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I plan for changing the timeout to change already existing timers (the same connection will pick up the new timeout after the next request, but it doesn't matter what happens to the already-running read, I thought).
But perhaps the if() isn't helpful in the sense that cancel() is already very cheap to call if the timer wasn't set - it also just do a if(), so why bother with another if?
Sorry! Yes I missed a few words there. I was trying to say that none of these points speak against having your patch :). |
Initially I wanted to be able to do that, but then I discovered we're using a function By the way, with the two different timeouts you suggested you won't be able to implement the timeout on the full query as I implemented now. For example, if you add just a 1-second timeout between bytes, you can still send a large request over a very long time (the "slowloris" attack). Moreover, you can't really use some very low number for this between-bytes timeout, because packet loss and network glitches - you don't really want to drop a request in the middle just because of a half-second network glitch. So I'm not sure what making the two timeouts have different values can buy us. |
|
What if someone uploads large amounts of data? |
As I explained several times, including in the doxygen comment: So the request_timeout feature is not useful when the server application expects to serve good requests that really take an hour to send - e.g., a "upload of large amounts of data" - because you won't be able to set the request_timeout to less than hour. But I expect most server applications - including the ones that everyone in this discussion are writing - simply do not have this problem, and never expect any single request to be send over one second - so that timing out a request after, say, one minute, is good for them. |
Well in Redpanda we do have prometheus endpoint requests which take 10-60 seconds for a single request (but that's no problem for this approach really as we'd probably set this timeout to minutes). |
This patch adds to http_server a set_request_timeout() method which enables a time limit on reading a request from a connection. When this timeout is reached before a request is read, the connection is closed.
The primary goal for this feature is to clean up persistent connections (HTTP keep-alive) from clients that have died, or were "forgotten" by a proxy or load balancer which created them. Such dead connections can accumulate over time and consume server resources, and the new request timeout feature will remove them.
The request timeout can drop not only idle connections, but also connections where a client died in the middle of sending a request, and will never continue. This situation isn't common, but again if we don't clean up these connections, they can accumulate over time.
Finally, the request timeout can also help defend against slowloris DoS attacks - where an attacker deliberately sends a request extremely slowly. The request timeout can drop these connections even though they are not completely idle.
The set_keepalive_parameters() feature added recently can also solve some of these problems, but not all of them, and needs to work harder to verify a connection is really dead - whereas set_request_timeout() just assumes that a connection can be closed immediately if a request wasn't received for a long time.
The request timeout continues to default to 0, meaning it is disabled, so this patch doesn't change the http server's behavior for applications unless they choose to use the new feature.
Fixes #3130