[AI] LiveSession WebSocket leak fixes#16137
Conversation
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. |
|
Although the Live API tests still failed, the logs were much more readable with most of the sockets being cleaned up. xcodebuild-test-without-building.log |
Summary of ChangesThis pull request addresses critical socket leaks and connection instability identified in the LiveSession implementation. By resolving strong reference cycles and improving cleanup mechanisms, the changes aim to reduce test flakiness and resource exhaustion on CI environments. Additionally, the update introduces improved error handling and connection monitoring to ensure more robust network communication. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Activity
|
There was a problem hiding this comment.
Code Review
This pull request introduces several improvements to the LiveSession and AsyncWebSocket components, focusing on connection stability and resource management. Key changes include the addition of a 30-second ping mechanism in AsyncWebSocket, improved task lifecycle management for receiving and pinging, and a thread-safe streamState in LiveSessionService to better manage the responses stream. Additionally, the integration tests were updated to ensure sessions are properly closed during failures, and the iOS deployment target for the test app was bumped to 16.0. Review feedback highlights potential issues with error propagation in LiveSessionService where the stream might be finished normally before an error is thrown, and suggests adding explicit cancellation checks in the message queue loop.
There was a problem hiding this comment.
Code Review
This pull request introduces a ping mechanism to AsyncWebSocket and refactors LiveSessionService to use UnfairLock for thread-safe stream state management. It also bumps the iOS deployment target to 16.0 and improves error handling in integration tests. Feedback includes suggestions to mark message processing methods as nonisolated to reduce context switching, refactor repeated transcript collection logic in tests into a helper method, and adopt the modern Task.sleep(for:) syntax available in iOS 16.
There was a problem hiding this comment.
Code Review
This pull request introduces a heartbeat mechanism to AsyncWebSocket via a new ping task and improves lifecycle management for websocket tasks and response streams. In LiveSessionService, stream state is now synchronized using UnfairLock, allowing for stream recreation upon reconnection. The integration tests were updated with do-catch blocks to ensure proper session closure on failure, though it is suggested to refactor this repetitive boilerplate into a helper method for better maintainability. Additionally, the iOS deployment target for the test app was bumped to 16.0.
| do { | ||
| text = try await session.collectNextAudioOutputTranscript() | ||
| } catch { | ||
| await session.close() | ||
| throw error | ||
| } |
There was a problem hiding this comment.
The do-catch pattern used here to ensure session.close() is called when an error occurs is repeated across almost every test in this file. While this correctly addresses the socket leak issue, it adds significant boilerplate. To improve maintainability, consider refactoring this into a helper method that wraps the session logic and handles the cleanup in a single place.
LiveSession Socket Leak Analysis
Executive Summary
Investigation into the flakiness of
LiveSessionTestsand the recurringsetsockopt SO_CONNECTION_IDLE failedlogs revealed multiple strong reference cycles in theLiveSessionimplementation. These cycles prevent the deinitialization of WebSocket components when a session is dropped (especially during test failures), leading to leaked connections and unstable network behavior on CI.Identified Reference Cycles
1.
AsyncWebSocketReceiving Task CycleIn
AsyncWebSocket.swift, thestartReceiving()method launches aTaskthat capturesselfstrongly:Impact:
AsyncWebSocketis held alive by this background task as long as the socket is not explicitly closed. Sincedeinitis responsible for callingdisconnect()(which would stop the task), the object becomes un-releasable if the user forgets to callclose().2.
LiveSessionServiceActor Task CyclesThe
LiveSessionServiceactor manages two long-running tasks that capture the actor reference strongly:Impact: The actor cannot be deinitialized as long as
responsesTaskormessageQueueTaskare active. These tasks only terminate when the stream finishes or they are cancelled. However, cancellation typically happens indeinit, which is blocked by the tasks themselves.3. Missing
LiveSessionCleanupLiveSession.swiftacts as the public interface but lacks adeinitmechanism to ensure the underlying service is closed if the session object is dropped.Impact on Integration Tests
In
LiveSessionTests.swift, many tests follow this pattern:If any operation before
session.close()throws an error (which is common in flaky CI environments), theclose()call is skipped. Due to the reference cycles identified above, the droppedsessionobject fails to clean up itsLiveSessionServiceandAsyncWebSocket, resulting in a leaked WebSocket connection.Accumulated leaks on CI likely lead to:
URLSession.setsockopt SO_CONNECTION_IDLE failed [42: Protocol not available]as the OS attempts to manage orphaned connections.Technical Cause of Hangs
URLSessionWebSocketTaskdoes not send pings automatically. Without pings, neither the client nor the server can quickly detect if the other side has silently dropped the connection.LiveSessionService, if a WebSocket closes normally (or is mapped to a "normal" closure like.goingAway), the internalresponseContinuationwas not being finished, causing any code awaitingsession.responsesto hang indefinitely.AsyncWebSocketwas checkingtask.errorinstead of the error caught fromreceive(), which sometimes led to losing the specific error code (like POSIX 57).Applied Fixes
selfwith[weak self]in all backgroundTaskclosures withinAsyncWebSocketandLiveSessionService.deinittoLiveSessionthat spawns a non-blocking task to callservice.close().LiveSessionTests.swiftto consistently usedo-catchblocks around session operations.AsyncWebSocketto detect dead connections.LiveSessionService.close()to allow closing the WebSocket connection without terminating the publicresponsesstream. This allows the stream to survive connection resets (like duringresumeSession()) and seamlessly receive messages from the new connection, fixing a regression where resumed sessions would return empty responses.#no-changelog