fix: low latency client streaming waiting for data to close connection after client closes by Botelho31 · Pull Request #6669 · caddyserver/caddy

Botelho31 · 2024-10-31T18:31:32Z

My suggestion on how to properly fix #6666

mholt · 2024-10-31T19:41:02Z

Very interesting!

Thanks for the contribution. I'm reviewing it now.

So, uh, forgive me for asking, but does this work as intended? I don't think I would have come up with this, and if it works, that's great. :) It would take me some time to set up a test harness for this (or perhaps we should write an automated test). Curious if you have seen this working as you developed it. 👍

Botelho31 · 2024-10-31T19:58:48Z

I actually have created some automated tests already ( as i was not too sure myself of if this would work 100% ) but didn't add them as im not too sure where i should put them in the file structure.

Should i just create a delay-client-done-context_test file and add them in? Or should they be somewhere else.

mholt · 2024-10-31T20:10:41Z

@Botelho31 Typically, you'll put the Go code that runs the test in a file of the same name but ending with _test.go (see testing). Are there any external files that are required? (Maybe they can be inlined into the _test.go file, that's ideal.)

francislavoie · 2024-10-31T20:10:46Z

Tests for this could go in caddytest/integration I think

modules/caddyhttp/reverseproxy/reverseproxy.go

…elect test

Botelho31 · 2024-10-31T22:59:37Z

I added the unit tests, but im not too sure on how to create a integration test for this behaviour, im gonna take a shot at it.

Botelho31 · 2024-11-01T16:11:04Z

So i have written the integration test and have seen it fail, but im finding that the timing is very precise for Caddy to not send the EOF to the backend. It is already triggering context failed every time if not using the new variable.

modules/caddyhttp/reverseproxy/reverseproxy.go

mholt · 2024-11-04T21:12:54Z

Thank you for this!

I've given this another look-over, and now I wonder if we should be closing the connection proactively after uploading the request, OR if we should wait for the backend to respond / close the connection like we're doing here?

In either case, the client has already disconnected so that doesn't matter, but I'm not sure if the intention is for the proxy to also disconnect before a response?

But I think this is worth a shot either way.

Do you have a moment to test with the original client reproducer presented in #4922?

Maybe @kuhnchris could offer some insight.

Botelho31 · 2024-11-05T00:50:34Z

I actually implemented my integration testing based on his, i will give a try to the ffmpeg script to see if my issue is due to caddy not being fast enough.

About proactivelly closing, i guess we could close on the next error catch, but isn`t there any important processing that happens afterwards? Or is it all logic related to the client, then we should definitely close it.

WeidiDeng

According to official documentation, Done should return the reference to the same channel. Does this have any impact? Some requests might keep running until caddy is reloaded because they keeps onto the channel before the request is finished, when subsequent calls will return a channel that is from the client. I know http3 does this.

Botelho31 · 2024-11-05T14:19:26Z

I just read this part of the documentation. Actually as we are only doing this on a per roundtrip basis ( independently after roundtrip we return to old context.Done ), i don`t think it will affect other requests.

As an update, i have tested this using the python ffmpeg reproducer from the original client and again, i cant get it to fail. I dont know if the actual problem on reproducing this behaviour is on the way the backend server handles the roundtrip ( maybe the presence of buffers on the request body in golang? ).

mholt · 2024-11-05T16:45:34Z

Hmm, well thank you for digging into it. I am about to tag 2.9 beta 3 today, and I am trying to decide whether to include this or wait until it's a little more ready / understood. I'm thinking of giving this PR more time until we have a grasp as to why the tests aren't failing in the first place, and so we can be sure this really works.

What do you think?

Botelho31 · 2024-11-05T17:01:40Z

It's better to get a way to reproduce the error every-time before adding this PR i think. Im sorry for the overall slow pace this is taking. Im gonna give it another try later today

mholt · 2024-11-05T17:14:58Z

No worries. We appreciate you contributing it! It's totally okay if it doesn't go in today, I just didn't want to disappoint. Looking forward to figuring things out and seeing it merged :)

WeidiDeng · 2024-11-05T18:21:58Z

I just read this part of the documentation. Actually as we are only doing this on a per roundtrip basis ( independently after roundtrip we return to old context.Done ), i don`t think it will affect other requests.

Context is per request. Naturally changing the context of a request won't affect other requests. What do you mean by this?

As I said, I am curious if this will impact the response handling as it's clearly different from stdlib definition. Some parts of the code might hold on to the old reference which leads to context cancelling not firing for these code paths. I'm not asking that we stick to stdlib behaviour. If test checks out, write a comment about it for future reference.

WeidiDeng · 2024-11-06T00:58:37Z

modules/caddyhttp/reverseproxy/reverseproxy.go

+// the roundtrip. We delay the context.Done() channel until
+// the round trip is done, or until the parent context is
+// done, whichever comes first.
+type delayClientDoneContext struct {


I think instead of introducing a new type, we can just use context.WithCancel with h.ctx and context.AfterFunc with the request context and cancel the context introduced before to achieve the same effect. Not only does this save many lines of code, it will also avoid the Done returning different channel inconsistency with stdlib. What do you think?

ctx, cancel := context.WithCancel(h.ctx) defer cancel() stop := context.AfterFunc(req.Context(), cancel) defer stop()

Botelho31 · 2024-11-08T01:50:46Z

I have tested this in the following ways trying to catch an error:

Using the old POC from the original issue
Go Client -> Caddy -> Go Backend Server
Go Client -> Caddy -> Original POC uwsgi
Ffmpeg Original Poc -> Caddy -> Go Backend Server

It seems to me that SOMETHING has changed in golang stdlib RoundTrip in the last two years that has fixed this on their end. I could not in any way shape or form recreate the old error. Maybe this whole pull request isn`t needed anymore, and for now i give up on trying to reproduce this. If someone wants to give it a shot.

Reading the official package github, i believe the commit that could have solved this is this one

francislavoie · 2024-11-08T02:35:06Z

@Botelho31 so is the conclusion that this PR isn't needed after all? Or do you just mean one specific branch/usecase is one you're not able to replicate? Sorry I'm not super caught up here, just wanting to get a summary 😅

Botelho31 · 2024-11-08T03:10:05Z

I cannot replicate the original error #4922 that was fixed by the flush_interval -1 in todays version of caddy/go in order to test if this pull request fixes that.

Maybe it was fixed on stdlib, or maybe its just a skill issue on my part ( im quite new to golang ), either way i wont be trying this any further.

You guys can close this pull request if you like, and if nobody can recreate this issue maybe we should even remove the original dirty fix.

mholt · 2024-11-08T15:38:39Z

Ah wow -- well, thank you so much for working on this @Botelho31! We really appreciate your contribution, regardless of whether we merge it or not. You've been a pleasure to work with! We super appreciate the deep-dive into narrowing things down as well.

I wonder if recent implementation of "full duplex" has anything to do with this? @francislavoie what do you think?

francislavoie · 2024-11-08T15:50:24Z

No, the full-duplex thing is opt-in only (off by default)

mholt · 2024-11-12T18:54:12Z

Okay. So, it sounds like with the current version of Caddy (and/or Go), the issue reported by #4922 may no longer be a problem and we don't need our patch from back then.

In that case, we can even try reverting the original patch and seeing if anyone experiences the original behavior anymore. Sounds like they won't.

That's great -- we might not need a separate setting with a complex implementation, or a buggy patch at all!

@Botelho31 Thank you again so much for working on this. I will respectfully close this PR, but we will refer to it in the future if it comes up again. What I'll probably do then is revert the original patch and credit you in my commit :)

…eam mode i.e. Revert commit f5dce84 Two years ago, the patch in #4952 was a seemingly necessary way to fix an issue (sort of an edge case), but it broke other more common use cases (see #6666). Now, as of #6669, it seems like the original issue can no longer be replicated, so we are reverting that patch, because it was incorrect anyway. If it turns out the original issue returns, a more proper patch may be in #6669 (even if used as a baseline for a future fix). A potential future fix could be an opt-in setting.

…tion in stream mode i.e. Revert commit f5dce84 Two years ago, the patch in caddyserver#4952 was a seemingly necessary way to fix an issue (sort of an edge case), but it broke other more common use cases (see caddyserver#6666). Now, as of caddyserver#6669, it seems like the original issue can no longer be replicated, so we are reverting that patch, because it was incorrect anyway. If it turns out the original issue returns, a more proper patch may be in caddyserver#6669 (even if used as a baseline for a future fix). A potential future fix could be an opt-in setting.

Botelho31 added 4 commits October 31, 2024 14:34

feat: close after receive data for low latency streaming.

569c3f5

feat: add mutex to roundtripdone function

fcf67b6

docs: move comments on delay client done

1beff2f

feat: change way to call delayCtx after roundtrip

9dbc182

mholt added the under review 🧐 Review is pending before merging label Oct 31, 2024

mholt added this to the v2.9.0-beta.3 milestone Oct 31, 2024

francislavoie reviewed Oct 31, 2024

View reviewed changes

modules/caddyhttp/reverseproxy/reverseproxy.go Show resolved Hide resolved

Botelho31 added 3 commits October 31, 2024 19:37

test: add unit testing for delay client done context

7f9a805

feat: add close after received body to reverse proxy caddyfile config

101d66e

fix: adjust delay for parent to be able to cancel after timerout on s…

e54c8ef

…elect test

Botelho31 added 3 commits November 1, 2024 11:15

test: add initial integration test for low latency streaming

c6a9886

test: now able to make context_cancelled error happen

7e5e9a8

test: catch when Copy hangs due to no EOF

0e8af2d

WeidiDeng reviewed Nov 3, 2024

View reviewed changes

modules/caddyhttp/reverseproxy/reverseproxy.go Outdated Show resolved Hide resolved

feat: change atomic.Value to atomic.Bool on delayContext doneSet

25379e3

WeidiDeng reviewed Nov 5, 2024

View reviewed changes

mholt modified the milestones: v2.9.0-beta.3, v2.9.0-beta.4 Nov 5, 2024

WeidiDeng reviewed Nov 6, 2024

View reviewed changes

mholt closed this Nov 12, 2024

mholt mentioned this pull request Nov 12, 2024

Proper fix for low-latency client streaming #6666

Closed

Uh oh!

Conversation

Botelho31 commented Oct 31, 2024

Uh oh!

mholt commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Botelho31 commented Oct 31, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mholt commented Oct 31, 2024

Uh oh!

francislavoie commented Oct 31, 2024

Uh oh!

Uh oh!

Botelho31 commented Oct 31, 2024

Uh oh!

Botelho31 commented Nov 1, 2024

Uh oh!

Uh oh!

mholt commented Nov 4, 2024

Uh oh!

Botelho31 commented Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

WeidiDeng left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Botelho31 commented Nov 5, 2024

Uh oh!

mholt commented Nov 5, 2024

Uh oh!

Botelho31 commented Nov 5, 2024

Uh oh!

mholt commented Nov 5, 2024

Uh oh!

WeidiDeng commented Nov 5, 2024

Uh oh!

WeidiDeng Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

Botelho31 commented Nov 8, 2024

Uh oh!

francislavoie commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Botelho31 commented Nov 8, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mholt commented Nov 8, 2024

Uh oh!

francislavoie commented Nov 8, 2024

Uh oh!

mholt commented Nov 12, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

mholt commented Oct 31, 2024 •

edited

Loading

Botelho31 commented Oct 31, 2024 •

edited

Loading

Botelho31 commented Nov 5, 2024 •

edited

Loading

WeidiDeng left a comment •

edited

Loading

francislavoie commented Nov 8, 2024 •

edited

Loading

Botelho31 commented Nov 8, 2024 •

edited

Loading