-
Notifications
You must be signed in to change notification settings - Fork 745
Handle vsock spurious wakeups gracefully instead of asserting #3501
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
c03edb3
2480484
909e5ea
2f963a5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -1201,9 +1201,20 @@ class BaseSocketChannel<SocketType: BaseSocketProtocol>: SelectableChannel, Chan | |
| // The only way to avoid that race, would be to use heavy handed synchronisation primitives like IOSQE_IO_DRAIN (basically | ||
| // flushing all pending requests and wait for a fake event result to sync up) which would be awful for performance, | ||
| // so better skip the assert() for io_uring instead. | ||
| #if !SWIFTNIO_USE_IO_URING | ||
| assert(readResult == .some) | ||
| #endif | ||
|
|
||
| // NOTE: The original assert assumed kqueue/epoll always have implicit sync, | ||
| // but vsock (virtual sockets) may not provide the same guarantees as regular | ||
| // TCP/Unix sockets. Handle .none gracefully instead of crashing. | ||
| // See: https://github.com/apple/swift-nio/issues/3500 | ||
| // https://github.com/apple/containerization/issues/503 | ||
| if readResult == .none { | ||
| // Spurious wakeup - no data available despite POLLIN notification. | ||
| // This can happen with vsock or other non-standard socket types that | ||
| // don't have the same kqueue synchronization guarantees. | ||
| self.readIfNeeded0() | ||
| return .normal(.none) | ||
| } | ||
|
Comment on lines
+1210
to
+1216
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This branch and early-return has actually changed the semantics for io_uring -- when |
||
|
|
||
| if self.lifecycleManager.isActive { | ||
| self.pipeline.syncOperations.fireChannelReadComplete() | ||
| } | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -196,7 +196,14 @@ func isUnacceptableErrno(_ code: CInt) -> Bool { | |
| // is valid but the accepted one is not. The right solution here is to perform a check for | ||
| // SO_ISDEFUNCT when we see this happen, but we haven't yet invested the time to do that. | ||
| // In the meantime, we just tolerate EBADF on iOS. | ||
| #if canImport(Darwin) && !os(macOS) | ||
| // | ||
| // NOTE: We now also tolerate EBADF on macOS because vsock (virtual sockets) used by | ||
| // Apple's Containerization framework can experience similar file descriptor invalidation | ||
| // issues. The vsock hypervisor layer may close descriptors in ways that cause EBADF | ||
| // during normal operation. | ||
|
Comment on lines
+202
to
+203
Contributor
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Can we confirm this statement is true? Specifically "during normal operation" -- should we really expect the hypervisor to be closing the FD and have we ruled out this being a bug in Container? IIUC a file descriptor that is closed out-of-band, under NIO's feat, violates NIOs model, and it would be problematic to just gloss over it for everything. The commentary above about tolerating for iOS is very specific, but this patch is opening it up to everything on macOS, not just VSOCK, even if that is something we wanted to do.
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fwiw, I (containerization maintainer) cannot repro the issue Adrian is seeing so far. I can dig in on the hypervisor end, but I'm mostly in Simon's camp here in terms of his comment left. I don't think we should ignore EBADF either, or at the very least offer some escape hatch for folks who don't want the library to assert on EBADFs. On the hypervisor/VMM end, Virtualization.framework does seem to close all vsock FDs it has vended when the VM is stopped, but I imagine this is intended and I view this as a good thing. We had quite a few bugs originally for our container stop logic that would trigger the ebadf assert in NIO, but never on startup that I've seen yet apple/containerization#503. |
||
| // See: https://github.com/apple/swift-nio/issues/3500 | ||
| // https://github.com/apple/containerization/issues/503 | ||
| #if canImport(Darwin) | ||
| switch code { | ||
| case EFAULT: | ||
| return true | ||
|
|
@@ -216,18 +223,34 @@ func isUnacceptableErrno(_ code: CInt) -> Bool { | |
| @inlinable | ||
| public func isUnacceptableErrnoOnClose(_ code: CInt) -> Bool { | ||
| // We treat close() differently to all other FDs: we still want to catch EBADF here. | ||
| // NOTE: On Darwin, we tolerate EBADF because vsock (virtual sockets) used by | ||
| // Apple's Containerization framework can have file descriptors invalidated by | ||
| // the hypervisor layer during normal operation, causing EBADF on close. | ||
| // See: https://github.com/apple/swift-nio/issues/3500 | ||
| // https://github.com/apple/containerization/issues/503 | ||
| #if canImport(Darwin) | ||
| switch code { | ||
| case EFAULT: | ||
| return true | ||
| default: | ||
| return false | ||
| } | ||
| #else | ||
| switch code { | ||
| case EFAULT, EBADF: | ||
| return true | ||
| default: | ||
| return false | ||
| } | ||
| #endif | ||
| } | ||
|
|
||
| @inlinable | ||
| internal func isUnacceptableErrnoForbiddingEINVAL(_ code: CInt) -> Bool { | ||
| // We treat read() and pread() differently since we also want to catch EINVAL. | ||
| #if canImport(Darwin) && !os(macOS) | ||
| // NOTE: We tolerate EBADF on all Darwin platforms (not just iOS) because vsock | ||
| // can cause file descriptor invalidation on macOS as well. | ||
| #if canImport(Darwin) | ||
| switch code { | ||
| case EFAULT, EINVAL: | ||
| return true | ||
|
|
||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment is referring to an assert that will not exist once this patch is merged, so will be confusing to maintainers. I appreciate you added a link to some GitHub issues, but it would be better if the comments in the code explain the code that's actually here.
Relatedly, the lengthy commentary above, about skipping the assertion for io_uring, no longer makes sense.