Skip to content

Conversation

@nyh
Copy link
Contributor

@nyh nyh commented Dec 10, 2025

This patch adds to http_server a set_request_timeout() method which enables a time limit on reading a request from a connection. When this timeout is reached before a request is read, the connection is closed.

The primary goal for this feature is to clean up persistent connections (HTTP keep-alive) from clients that have died, or were "forgotten" by a proxy or load balancer which created them. Such dead connections can accumulate over time and consume server resources, and the new request timeout feature will remove them.

The request timeout can drop not only idle connections, but also connections where a client died in the middle of sending a request, and will never continue. This situation isn't common, but again if we don't clean up these connections, they can accumulate over time.

Finally, the request timeout can also help defend against slowloris DoS attacks - where an attacker deliberately sends a request extremely slowly. The request timeout can drop these connections even though they are not completely idle.

The set_keepalive_parameters() feature added recently can also solve some of these problems, but not all of them, and needs to work harder to verify a connection is really dead - whereas set_request_timeout() just assumes that a connection can be closed immediately if a request wasn't received for a long time.

The request timeout continues to default to 0, meaning it is disabled, so this patch doesn't change the http server's behavior for applications unless they choose to use the new feature.

Fixes #3130

This patch adds to http_server a set_request_timeout() method which
enables a time limit on reading a request from a connection. When
this timeout is reached before a request is read, the connection is
closed.

The primary goal for this feature is to clean up persistent connections
(HTTP keep-alive) from clients that have died, or were "forgotten" by
a proxy or load balancer which created them. Such dead connections can
accumulate over time and consume server resources, and the new
request timeout feature will remove them.

The request timeout can drop not only idle connections, but also
connections where a client died in the middle of sending a request,
and will never continue. This situation isn't common, but again if
we don't clean up these connections, they can accumulate over time.

Finally, the request timeout can also help defend against slowloris
DoS attacks - where an attacker deliberately sends a request extremely
slowly. The request timeout can drop these connections even though
they are not completely idle.

The set_keepalive_parameters() feature added recently can also solve
some of these problems, but not all of them, and needs to work harder
to verify a connection is really dead - whereas set_request_timeout()
just assumes that a connection can be closed immediately if a request
wasn't received for a long time.

The request timeout continues to default to 0, meaning it is disabled,
so this patch doesn't change the http server's behavior for
applications unless they choose to use the new feature.

Fixes scylladb#3130

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
@nyh
Copy link
Contributor Author

nyh commented Dec 11, 2025

Pushed a new version with a fourth test with two more verifications (as usual, explained in comments).

Dear reviewers, if you're curious why I decided to put the timeout on receiving whole requests and not, for example, on a single read making progress, well - it started with an implementation problem: The problem was that the code uses consume() which takes care of the whole read loop for a single request and doesn't let me do things like reset the timer between each read. But once I made it into a timeout on the whole request I realized this is actually "good enough" for most "modern" applications, where you don't expect extremely slow 100 baud connections or extremely long request, so a (for example) 10 second timeout on receiving a request is not a problem. But beyond being good enough, it also has a benefit: It protects against attackers deliberately trickling long requests very slowly to consume a lot of server memory with little client effort ("slowloris" attack).

@nyh
Copy link
Contributor Author

nyh commented Dec 11, 2025

@StephanDollberg @travisdowns this is a feature I propose can be used instead of TCP keepalive to clean up dead httpd connections.

@StephanDollberg
Copy link
Contributor

Yeah I think this is another layer which is good to have.

"good enough" for most "modern" applications, where you don't expect extremely slow 100 baud connections or extremely long request, so a (for example) 10 second timeout on receiving a request is not a problem. But beyond being good enough, it also has a benefit: It protects against attackers deliberately trickling long requests very slowly to consume a lot of server memory with little client effort ("slowloris" attack).

Getting a complete timeout solution can actually be quite tricky. Having one total timeout can be good for slowloris but if you have an http/file server which serves bigger files you need to account for slower clients as well which might only be downloading with XXX KB/s. With only one total timeout they'll in all likely hood always run into the timeout.

Inter-read timeouts can also be advantagous as they allow to keep idle connections open for longer which is good for keep-alive (which is always at odds with timeouts).

For full slowloris prevention you'd also need send timeouts as people could just receive very slowly on their and hence keep you busy for longer.

But anyway none of that makes the simple approach good to have in combination with TCP keepalives.

@avikivity
Copy link
Member

I think there's possibly two different timeouts here. One is for idle connections for which not a single byte was received. Another is for a connection which started receiving data, but is receiving it slowly.

Should we have two different timeouts here? Should we reset the timeout any time a byte is received?

@nyh
Copy link
Contributor Author

nyh commented Dec 11, 2025

Getting a complete timeout solution can actually be quite tricky. Having one total timeout can be good for slowloris but if you have an http/file server which serves bigger files you need to account for slower clients as well which might only be downloading with XXX KB/s. With only one total timeout they'll in all likely hood always run into the timeout.

This is why I clearly called the new feature "request_timeout" and applications where the total request time can be high - like when you have 100-baud clients or when requests can include large uploads - will not be able to use this new feature. I also wrote such a warning in the comment.

If you want a timeout on individual reads (not the complete request), it's actually not very trickyl - you can do exactly what I did but just restart the timer before every single read from the socket - not in the beginning of the request like I did. The only tricky part is that our current implementation uses a utility function consume() which itself calls read() in a loop and doesn't let us add anything to this loop. If we want to add such a feature, and call it, say set_read_timeout() (as an alternative to set_query_timeout()) we can actually do that. But since I didn't want to change consume() and don't care too much about the very-slow-request workloads , I didn't bother. Maybe I should have bothered.

Inter-read timeouts can also be advantagous as they allow to keep idle connections open for longer which is good for keep-alive (which is always at odds with timeouts).

What is true that if the timeout is 30 seconds and you just spent 29.9 seconds with the connection being idle (keep-alive), now you only have 0.1 seconds to completely receive the request, and may close the connection in the middle. Of course, the chance that this will happen is unlikely, unless you badly tune the timeout to your workload. For example, imagine that you know that your workload often creates a connection and has a loop which sends a request every exactly 10 seconds - you wouldn't want to use query_timeout for exactly 10 seconds, this would be bad.

For full slowloris prevention you'd also need send timeouts as people could just receive very slowly on their and hence keep you busy for longer.

This is, unfortunately, true.

With my patch, the attacker will not be able to force you to hold a very large (and very slowly growing) request in memory, but it might be able (depending on what the server is actually doing and whether it needs to generate large responses in memory) be able to force you to generate a very large response in memory and then read it very slowly.

But anyway none of that makes the simple approach good to have in combination with TCP keepalives.

It makes the "simple approach" (I guess you mean my patch?) good to have, or not good to have? I couldn't get to the bottom of this sentence :-)

}
_resp.reset();
if (_server._request_timeout.count() > 0) {
_request_timeout_timer.cancel();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should you not cancel unconditionally? In case the timeout was changed.

I guess it doesn't matter - setting the request timeout doesn't have to synchronize with running requests.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I plan for changing the timeout to change already existing timers (the same connection will pick up the new timeout after the next request, but it doesn't matter what happens to the already-running read, I thought).

But perhaps the if() isn't helpful in the sense that cancel() is already very cheap to call if the timer wasn't set - it also just do a if(), so why bother with another if?

@StephanDollberg
Copy link
Contributor

It makes the "simple approach" (I guess you mean my patch?) good to have, or not good to have? I couldn't get to the bottom of this sentence :-)

Sorry! Yes I missed a few words there. I was trying to say that none of these points speak against having your patch :).

@nyh
Copy link
Contributor Author

nyh commented Dec 11, 2025

I think there's possibly two different timeouts here. One is for idle connections for which not a single byte was received. Another is for a connection which started receiving data, but is receiving it slowly.

Should we have two different timeouts here? Should we reset the timeout any time a byte is received?

Initially I wanted to be able to do that, but then I discovered we're using a function consume() which takes care of the read loop for us and doesn't let me do things between each read. So instead of hacking on consume() I went the "easy" way and added a timeout just for the entire request. Maybe I should indeed put more effort into this.

By the way, with the two different timeouts you suggested you won't be able to implement the timeout on the full query as I implemented now. For example, if you add just a 1-second timeout between bytes, you can still send a large request over a very long time (the "slowloris" attack). Moreover, you can't really use some very low number for this between-bytes timeout, because packet loss and network glitches - you don't really want to drop a request in the middle just because of a half-second network glitch. So I'm not sure what making the two timeouts have different values can buy us.

@avikivity
Copy link
Member

What if someone uploads large amounts of data?

@nyh
Copy link
Contributor Author

nyh commented Dec 11, 2025

What if someone uploads large amounts of data?

As I explained several times, including in the doxygen comment:

     * A server which expects to handle low-bandwidth clients or very long
     * requests should be careful not to set the request timeout too low.

So the request_timeout feature is not useful when the server application expects to serve good requests that really take an hour to send - e.g., a "upload of large amounts of data" - because you won't be able to set the request_timeout to less than hour. But I expect most server applications - including the ones that everyone in this discussion are writing - simply do not have this problem, and never expect any single request to be send over one second - so that timing out a request after, say, one minute, is good for them.

@travisdowns
Copy link
Contributor

and never expect any single request to be send over one second

Well in Redpanda we do have prometheus endpoint requests which take 10-60 seconds for a single request (but that's no problem for this approach really as we'd probably set this timeout to minutes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

httpd should close connections that have been idle for a long time

4 participants