BaseStreamSocketChannel half-close allows outstanding writes to complete #3148

rnro · 2025-03-18T15:40:50Z

Motivation:

At the moment half-closes are actioned immediately and fails all outstanding writes. We should refuse new writes but allow these writes to complete before completing the close.

Modifications:

Modify the PendingWritesManager internal buffer to hold an enum of either writes or close events. We use this to store the close and only action it when the preceding writes have been handled.

Result:

Outbound close should no longer fail outstanding writes

Should resolve #3139

Sources/NIOPosix/BaseStreamSocketChannel.swift

Lukasa · 2025-03-21T16:54:17Z

Sources/NIOPosix/BaseStreamSocketChannel.swift

+                let outboundCloseState = self.pendingWrites.close(promise)
+                switch outboundCloseState {
+                case .open:
+                    preconditionFailure("Close resulted in an open state, this should never happen")


Let's avoid this precondition failure. Instead, we should produce an error just before line 219, and go ahead and close outbound anyway. We can then drop this to assertionFailure so that it crashes in testing.

Lukasa · 2025-03-21T16:55:28Z

Sources/NIOPosix/BaseStreamSocketChannel.swift

+                    self.pendingWrites.outboundCloseState = .closed
+                case .closed:
+                    ()  // nothing to do
+                }

                self.pipeline.fireUserInboundEventTriggered(ChannelEvent.outputClosed)


This user event now triggers unconditionally, and at the wrong time. It should trigger only after the shutdown. Also, don't forget the unregister for writable

Lukasa · 2025-03-21T16:57:32Z

Sources/NIOPosix/PendingWritesManager.swift

+            case .readyForClose:
+                ()
+            case .pending, .closed:
+                preconditionFailure("close called on channel in unexpected state: \(self.outboundCloseState)")


This is a touch nerve wracking. I can't see any reason this is actually impossible: I think if the user called close(mode: .output) twice in a row without the close having happened, they'd trap here. The same is true for the closed state.

More broadly, we have to tolerate the user doing this more than once. It's an error if they do, but that's ok.

I've attempted account for tolerating being called more than once by cascading promises.

Lukasa · 2025-03-21T16:58:13Z

Sources/NIOPosix/PendingWritesManager.swift

+            case .open:
+                self.outboundCloseState = .readyForClose(promise)
+            case .readyForClose:
+                ()


This code path leaks the promise.

Generally speaking, at this point we've taken ownership of the promise. It's our responsibility to do something with it, which we must do on all code paths.

I've changed the API of this object so that the promise is never surfaced after it has been passed in. The promise must be completed by a call to `closeComplete.

Sources/NIOPosix/PendingWritesManager.swift

Lukasa · 2025-03-21T16:58:50Z

Sources/NIOPosix/PendingWritesManager.swift

@@ -310,6 +321,8 @@ final class PendingStreamWritesManager: PendingWritesManager {

    private(set) var isOpen = true

+    internal var outboundCloseState: CloseState = .open


I think we need a very good reason to have this be anything but private(set).

Lukasa · 2025-03-21T16:59:27Z

Sources/NIOPosix/BaseSocketChannel.swift

+                case .open, .pending:
+                    ()
+                case .readyForClose(let eventLoopPromise):
+                    // TODO: it doesn't seem right that I have to pass an error in here)


Sources/NIOPosix/BaseStreamSocketChannel.swift

Lukasa · 2025-03-21T17:08:52Z

Sources/NIOPosix/PendingWritesManager.swift

+                }
+
+            case .couldNotWriteEverything:
+                ()


This one also feels like it is worth an assertionFailure.

glbrntt · 2025-04-01T14:00:31Z

Sources/NIOPosix/PendingWritesManager.swift

+        static func == (lhs: Self, rhs: Self) -> Bool {
+            switch (lhs, rhs) {
+            case (.writtenCompletely, .writtenCompletely):
+                return true
+            case (.couldNotWriteEverything, .couldNotWriteEverything):
+                return true
+            default:
+                return false
+            }
+        }


I get a bit nervous about ignoring state when comparing. Do we actually need the equality check? Also, it might make more sense to do this on the CloseState and compare the identity of its underlying futures where applicable.

It turns out we don't need it anymore so I've just deleted it.

🤦‍♂️ it's needed in tests. Looking again.

I think this has exposed that I was exposing the promise in a way which wasn't needed anyway because we just passed it right into pendingWrites.close which already held it. I've changed the method to return the CloseResult instead.

glbrntt · 2025-04-01T14:09:40Z

Sources/NIOPosix/PendingWritesManager.swift

    func failAll(error: Error, close: Bool) {
        if close {
            assert(self.isOpen)
            self.isOpen = false
+            self.state.removeAll()?.fail(error)


General rule of thumb: update your state before making outcalls. Failing the promise here might lead to calling back into this type again and we haven't fully reconciled our state yet. We should grab the promise and fail it afterwards.

Thanks for the general guidance 🙂

glbrntt · 2025-04-01T14:11:31Z

Sources/NIOPosix/PendingWritesManager.swift

+                closePromise?.fail(error)
+                self.outboundCloseState = .closed


Same here, update the state then complete the promise

glbrntt · 2025-04-01T14:15:12Z

Sources/NIOPosix/PendingWritesManager.swift

    func failAll(error: Error, close: Bool) {
        if close {
            assert(self.isOpen)
            self.isOpen = false
+            self.state.removeAll()?.fail(error)


Not sure if it's just the diff but it looks like this function also failed all the promises if !close, that doesn't seem to happen anymore.

glbrntt · 2025-04-01T14:22:12Z

Sources/NIOPosix/PendingWritesManager.swift

+                closePromise?.succeed(())
+            }
+
+            self.outboundCloseState = .closed


The state should be updated before completing the promise

glbrntt

This LGTM now but please wait for Cory's review as well.

Lukasa

Cool, this is looking closer, I think I just have some suggestions around promise hygiene and naming.

Lukasa · 2025-04-10T10:06:08Z

Sources/NIOPosix/PendingWritesManager.swift

+    ///
+    /// - Parameters:
+    ///   - promise: Optionally an `EventLoopPromise` that will be succeeded once all outstanding writes have been dealt with
+    func close(_ promise: EventLoopPromise<Void>?) -> CloseResult {


This should probably be called closeOutbound.

Sources/NIOPosix/BaseStreamSocketChannel.swift

Lukasa · 2025-04-10T11:59:19Z

Sources/NIOPosix/PendingWritesManager.swift

+                }
+
+            case .couldNotWriteEverything:
+                assertionFailure("Write result is .couldNotWriteEverything but we have no more writes to perform.")


If this is going to be an assertion failure we should have some fallback logic. Probably that should be throwing an error.

Lukasa · 2025-04-10T12:00:41Z

Sources/NIOPosix/PendingWritesManager.swift

+                self.outboundCloseState = .closed
+            case .pending(let closePromise), .readyForClose(let closePromise):
+                self.outboundCloseState = .closed
+                closePromise?.fail(error)


Hrm, should we return this instead of closing it at this stage?

Can you explain the thinking here? My understanding from our previous discussions was that it would be better that once the higher-level caller handed-off the promise it was the responsibility of the PendingWritesManager to complete the promise, so passing it in was a one-way street. This is why I separated-out CloseResult as its own type, so we have clarity on where the promise is exposed and not exposed.

Are you proposing that this method returns EventLoopPromise<Void>? where the promise is non-nil if the caller needs to do something with it?

Ah, I've just scrolled up and seen #3148 (comment) - that makes sense.

Lukasa · 2025-04-10T13:50:29Z

Sources/NIOPosix/PendingWritesManager.swift

+                closePromise.setOrCascade(to: promise)
+                self.outboundCloseState = .readyForClose(closePromise)
+            case .closed:
+                promise?.succeed(())


As noted above, it's probably better to let the caller complete the promise.

In this case are you proposing that this method returns (CloseResult, EventLoopPromise<Void>?) where the promise is non-nil if the caller needs to do something with it?

Or put the promise in the close result as associated data.

Lukasa · 2025-04-14T09:59:17Z

Sources/NIOPosix/BaseStreamSocketChannel.swift


-                self.pipeline.fireUserInboundEventTriggered(ChannelEvent.outputClosed)
+                let writesCloseResult = self.pendingWrites.closeOutbound(promise)


Ah, ok: there's a test missing here. The unregisterForWritable above isn't appropriate. I believe the following situation is possible:

Many writes are done, such that we are no longer writable at the socket layer.

The user issues a final flush.

The user calls close(mode: .output)

In this case, we'll have flushed, unwritten data, but we will no longer be registered for writable. As long as no other flush or close comes along, we'll stay unregistered for writable, and so we will make no attempt to empty the buffer. That will cause us to be wedged open.

I recommend, before fixing the bug, you write a test that correctly reproduces this wedge, and use that to validate the bug is actually fixed. This is going to be a challenging test to write. It can be written using both regular sockets, but it can also be written using the SAL, which may be easier. Let me know if you'd like to pair on a test for this.

I've created another test for this case which wedged until I resolved this issue.

Lukasa · 2025-04-14T10:00:37Z

Sources/NIOPosix/BaseStreamSocketChannel.swift

+                case .closed(let closePromise):
+                    closePromise?.succeed(())
+                case .open:
+                    promise?.fail(ChannelError.inappropriateOperationForState)


This path appears to violate the "pending writes manager takes ownership of the promise" construction.

Lukasa · 2025-04-14T10:02:42Z

Sources/NIOPosix/PendingWritesManager.swift

-    /// Fail all the outstanding writes. This is useful if for example the `Channel` is closed.
-    func failAll(error: Error, close: Bool) {
+    /// Fail all the outstanding writes.
+    func failAll(error: Error, close: Bool) -> CloseResult? {


This "boolean parameter with optional return contingent on the value of the bool" smells like there are two methods here. It may be worth factoring them apart. A safe way to do this, if we want to keep this patch understandable, is to make a separate refactoring PR that only refactors this method into two.

(Sidebar: the two methods are actually implemented as one calling the other: the one with close: true does some stuff, then it calls the one with close: false.)

Disregard the above PR. With my changes we will always close on a call to this method so I've removed the boolean.

Lukasa · 2025-04-14T10:06:24Z

Sources/NIOPosix/PendingWritesManager.swift

+                    "We are in .readyForClose state but we still have pending writes. This should never happen."
+                )
+            case .closed:
+                preconditionFailure(


I'm a touch nervous about these precondition failures. Can we come up with a logical behaviour here? Probably it's "fail the promise" and "assertionFailure to get crashes in debug".

I've done some broader restructuring to try and make this type of thing easier to reason through

Lukasa · 2025-04-14T10:06:56Z

Sources/NIOPosix/PendingWritesManager.swift

+        case .closed:
+            ()  // nothing to do
+        case .open, .pending:
+            preconditionFailure("close complete called on channel in unexpected state: \(self.outboundCloseState)")


Can we make this a thrown error instead of a crash?

Lukasa · 2025-04-14T10:07:37Z

Sources/NIOPosix/PendingWritesManager.swift

+
+        switch self.outboundCloseState {
+        case .readyForClose(let closePromise):
+            self.outboundCloseState = .closed


Ok, let's return the promise here rather than have this type complete it. It's a touch easier for the channel to manage things by completing things itself, rather than having this type do it, because the channel can ensure the state is squared away before making the outcall.

Lukasa · 2025-04-14T10:07:57Z

Sources/NIOPosix/PendingWritesManager.swift

+                        case .pending: .pending
+                        case .readyForClose(let closePromise): .readyForClose(closePromise)
+                        case .closed: .closed(nil)
+                        }


Let's outline this. Looks like an initializer to me.

Motivation: At the moment half-closes are actioned immediately and fails all outstanding writes. We should refuse new writes but allow these writes to complete before completing the close. Modifications: Modify the PendingWritesManager internal buffer to hold an enum of either writes or close events. We use this to store the close and only action it when the preceding writes have been handled. Result: Outbound close should no longer fail outstanding writes

* remove closeComplete * clarify ownership of promise * add new test to cover the case where the write notification is delivered at the wrong time (a regression in an earlier commit)

Lukasa

Ok, the product code here is looking a lot better. A small nit.

Lukasa · 2025-04-29T14:52:21Z

Sources/NIOPosix/PendingWritesManager.swift

+        /// is expected to fulfill it
+        internal enum WrittenCompletelyResult: Equatable {
+            case open
+            case pending


This pending state seems unreachable.

Good spot, removed

Lukasa · 2025-04-29T14:58:01Z

Tests/NIOPosixTests/StreamChannelsTest.swift

+        final class BytesReadCountingHandler: ChannelInboundHandler, Sendable {
+            typealias InboundIn = ByteBuffer
+
+            private let numBytes: NIOLoopBoundBox<Int>


No need to overcomplicate this, you can just use a NIOLockedValueBox.

Lukasa · 2025-04-29T14:58:16Z

Tests/NIOPosixTests/StreamChannelsTest.swift

+            public typealias OutboundIn = ByteBuffer
+            public typealias OutboundOut = ByteBuffer
+
+            private let numBytes: NIOLoopBoundBox<Int>


Same here, just use NIOLockedValueBox.

Lukasa · 2025-04-29T14:59:47Z

Tests/NIOPosixTests/StreamChannelsTest.swift

+            try chan2.eventLoop.submit {
+                XCTAssertNotEqual(bytesWritten.value, 0)
+                XCTAssertEqual(bytesRead.value, bytesWritten.value)
+            }.wait()


Let's extend these tests to confirm we do receive inputClosed as well, and at the right time.

It wasn't clear to me what the expectations would be for inputClosed but I saw an opportunity to add checks for outputClosed on the writer side so I did that for now.

At the point of receiving inputClosed we should record how many bytes we saw in an optional. We can then add an assertion at the end of the test that this value is equal to the total bytes sent. This asserts that both a) we saw the message (value isn't nil) and b) that we saw it at the right time.

Added, thank you.

Lukasa

Very nice, let's ship it.

rnro added the 🔨 semver/patch No public API change. label Mar 18, 2025

rnro force-pushed the queue_outbound_close branch from ef380ee to a9d9404 Compare March 18, 2025 16:32

Lukasa reviewed Mar 18, 2025

View reviewed changes

Sources/NIOPosix/BaseStreamSocketChannel.swift Show resolved Hide resolved

rnro requested a review from Lukasa March 21, 2025 15:29

Lukasa reviewed Mar 21, 2025

View reviewed changes

rnro requested a review from glbrntt March 31, 2025 14:30

glbrntt reviewed Apr 1, 2025

View reviewed changes

rnro requested a review from glbrntt April 2, 2025 09:02

rnro force-pushed the queue_outbound_close branch from c7f73ff to e439537 Compare April 2, 2025 09:23

glbrntt approved these changes Apr 7, 2025

View reviewed changes

rnro requested a review from Lukasa April 10, 2025 05:18

Lukasa reviewed Apr 10, 2025

View reviewed changes

rnro requested a review from Lukasa April 11, 2025 13:40

Lukasa reviewed Apr 14, 2025

View reviewed changes

rnro added 12 commits April 28, 2025 14:41

use state machine instead of promise

fca6c0e

reverting unrelated changes

3c7bff0

DatagramWritesManager only supports full closure

b8cc3a4

formatting

ba170e0

review comments

9f8919d

formatting

25d1eb1

review comments

bba49da

review comments

bada789

PendingWritesManager returns close promise

d2bbc6f

formatting

3e14181

remove closeComplete, clarify ownership of promise

ffdb030

* remove closeComplete * clarify ownership of promise * add new test to cover the case where the write notification is delivered at the wrong time (a regression in an earlier commit)

rnro force-pushed the queue_outbound_close branch from 4c4bdd9 to ffdb030 Compare April 28, 2025 13:52

rnro requested a review from Lukasa April 29, 2025 05:40

Lukasa reviewed Apr 29, 2025

View reviewed changes

review comments

9735793

rnro requested a review from Lukasa April 29, 2025 16:12

test checks for inputClosed event

340fd89

rnro force-pushed the queue_outbound_close branch from f63cea6 to 340fd89 Compare April 30, 2025 12:16

Lukasa approved these changes Apr 30, 2025

View reviewed changes

Lukasa merged commit ea87036 into apple:main Apr 30, 2025
44 of 45 checks passed

rnro deleted the queue_outbound_close branch April 30, 2025 14:04

adam-fowler mentioned this pull request Apr 30, 2025

Close on outbound finish hummingbird-project/hummingbird#708

Merged

		@@ -310,6 +321,8 @@ final class PendingStreamWritesManager: PendingWritesManager {

		private(set) var isOpen = true

		internal var outboundCloseState: CloseState = .open


		self.pipeline.fireUserInboundEventTriggered(ChannelEvent.outputClosed)
		let writesCloseResult = self.pendingWrites.closeOutbound(promise)

BaseStreamSocketChannel half-close allows outstanding writes to complete #3148

BaseStreamSocketChannel half-close allows outstanding writes to complete #3148

Uh oh!

Conversation

rnro commented Mar 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation:

Modifications:

Result:

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rnro Apr 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

glbrntt left a comment

Choose a reason for hiding this comment

Uh oh!

Lukasa left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

rnro commented Mar 18, 2025 •

edited

Loading

rnro Apr 2, 2025 •

edited

Loading