Version
2.46.0
Platform
iOS
Swift Version
6.2
What happened?
channel.subscribeWithError() stalls for tens of seconds (often minutes) on the first JOIN of a fresh socket, since v2.44.1. A clean bisect points to PR #855 (fix(realtime): extract ConnectionManager actor and fix connection lifecycle races, commit dcbc63d5) as the introducing change. Only one realtime commit lands between v2.44.0 and v2.44.1 — PR #855.
| Version |
Subscribe time on cold connect |
| v2.43.1 |
~2.3 s ✅ |
| v2.44.0 |
~3.1 s ✅ |
| v2.44.1 |
~60 s ❌ |
| v2.45.0 |
minutes ❌ |
| v2.46.0 |
~4–7 min ❌ |
The clearest signal in the trace: heartbeats never ack until at least one channel reaches .subscribed. Before subscribe wins, every heartbeat goes sent → timeout. After subscribe wins, every heartbeat goes sent → ok instantly. This holds across multiple sessions.
Hypothesis (not verified beyond the bisect)
PR #855's stated goal includes "Lost phx_join reply after connect: WebSocket.events lazily installed onEvent only when the stream was first iterated. Reading conn.events synchronously before spawning the message task ensures replies aren't dropped."
Empirically the change appears to drop early server-to-client frames (both heartbeat phx_replys and the phx_join reply) on the first socket — channel stays in .subscribing, pendingHeartbeatRef is never cleared, the heartbeat-timeout → connectionManager.handleError path recycles the socket every ~50 s, and the in-flight JOIN is killed each cycle. Eventually one cycle's JOIN slips through before its 50 s window expires.
This is a hypothesis from reading the v2.44.0 → v2.44.1 diff — the bisect is the load-bearing evidence.
Server-side context
Project's realtime logs show no tenant init/terminate events overlapping the failing subscribe windows; tenant is healthy and serves heartbeat acks instantly once subscribe wins.
REST queries on the same network are fast and reliable throughout.
Workaround
Pin to 2.44.0. Subscribes in ~3 s; no further changes needed.
Possibly using the SDK wrong?
I might be missing a usage-pattern change that landed alongside PR #855 or in subsequent versions. Specifically:
- I'm calling
subscribeWithError() directly (in a retry-forever wrapper). Is the recommendation now to use subscribe() (no WithError) and react to channel.statusChange instead, given the lifecycle is now managed by ConnectionManager / ChannelStateManager?
- I start the
postgresChange AsyncStream consumers before calling subscribeWithError() (so they're ready when the channel attaches). Is the recommendation now to do this in a different order?
- I use SDK defaults for
handleAppLifecycle (true on iOS), connectOnSubscribe (true), disconnectOnEmptyChannelsAfter (default), timeoutInterval (10 s), heartbeatInterval (25 s), maxRetryAttempts (5). Is there a combination that's expected to be configured explicitly for cold-start reliability now?
- Are
postgresChange bindings on a non-public schema (library in our case) treated differently in the new state machine?
Happy to be told this is a usage problem rather than a regression — but the bisect is the part I can't explain away on the usage hypothesis (v2.44.0 with the same call pattern works in ~3 s).
Steps to Reproduce
- Create a
SupabaseClient with default RealtimeClientOptions.
- Open one
RealtimeChannelV2: let channel = client.channel("library-realtime").
- Bind two
postgresChange streams on a custom (non-public) schema, e.g. schema: "library", table: "libros" and table: "libro_files".
- Start two
for await consumers on those streams in detached Tasks.
- Call
try await channel.subscribeWithError() (optionally wrapped in a retry-forever loop to avoid the SDK's "Maximum retry attempts reached" giving up after ~85 s).
- Observe
client.realtimeV2.heartbeat (AsyncStream) and channel.statusChange from app code.
Expected: channel reaches .subscribed within seconds (as it does in v2.44.0).
Actual on v2.44.1 → v2.46.0: socket reaches .connected in ~2 s, but heartbeats time out every ~50 s (sent → timeout), triggering reconnects that kill in-flight JOINs. The channel stays in .subscribing until — by luck after multiple cycles — a JOIN slips through before its next heartbeat-timeout reconnect. Subscribe time observed between ~60 s and ~7 minutes depending on version.
Code Sample
let channel = client.channel("library-realtime")
let libroStream = channel.postgresChange(AnyAction.self, schema: "library", table: "libros")
let fileStream = channel.postgresChange(AnyAction.self, schema: "library", table: "libro_files")
// Consumers started before subscribe so they're ready when the channel attaches.
Task { for await c in libroStream { handle(c) } }
Task { for await c in fileStream { handle(c) } }
// Retry-forever wrapper because subscribeWithError's internal max-retries
// (5 × ~10 s timeoutInterval) gives up before the first JOIN can land
// on broken versions.
Task {
while !Task.isCancelled {
do {
try await channel.subscribeWithError()
return
} catch {
try? await Task.sleep(for: .seconds(5))
}
}
}
Relevant log output
T=0.0s subscribe begin; channel = .subscribing; socket = .disconnected
T=2.1s socket = .connected
T=26.6s heartbeat sent ← first heartbeat
T=51.1s heartbeat timeout ← second heartbeat fires, finds previous
pendingHeartbeatRef set → timeout signalled
T=51.9s socket = .connecting ← SDK reconnects in response to heartbeat timeout
T=60.2s socket = .connected
T=70.1s subscribeWithError throws "Maximum retry attempts reached"
(5 internal retries × ~10 s timeoutInterval exhausted inside the
50 s window between socket-connect and heartbeat-timeout-reconnect)
... outer retry loop fires; pattern repeats ...
T=410.3s channel = .subscribed ← eventually succeeds
T=410.3s+ heartbeats now ack instantly (`sent` → `ok`, same millisecond)
Version
2.46.0
Platform
iOS
Swift Version
6.2
What happened?
channel.subscribeWithError()stalls for tens of seconds (often minutes) on the first JOIN of a fresh socket, since v2.44.1. A clean bisect points to PR #855 (fix(realtime): extract ConnectionManager actor and fix connection lifecycle races, commitdcbc63d5) as the introducing change. Only one realtime commit lands between v2.44.0 and v2.44.1 — PR #855.The clearest signal in the trace: heartbeats never ack until at least one channel reaches
.subscribed. Before subscribe wins, every heartbeat goessent→timeout. After subscribe wins, every heartbeat goessent→okinstantly. This holds across multiple sessions.Hypothesis (not verified beyond the bisect)
PR #855's stated goal includes "Lost phx_join reply after connect: WebSocket.events lazily installed onEvent only when the stream was first iterated. Reading
conn.eventssynchronously before spawning the message task ensures replies aren't dropped."Empirically the change appears to drop early server-to-client frames (both heartbeat
phx_replys and thephx_joinreply) on the first socket — channel stays in.subscribing,pendingHeartbeatRefis never cleared, the heartbeat-timeout →connectionManager.handleErrorpath recycles the socket every ~50 s, and the in-flight JOIN is killed each cycle. Eventually one cycle's JOIN slips through before its 50 s window expires.This is a hypothesis from reading the v2.44.0 → v2.44.1 diff — the bisect is the load-bearing evidence.
Server-side context
Project's
realtimelogs show no tenant init/terminate events overlapping the failing subscribe windows; tenant is healthy and serves heartbeat acks instantly once subscribe wins.REST queries on the same network are fast and reliable throughout.
Workaround
Pin to
2.44.0. Subscribes in ~3 s; no further changes needed.Possibly using the SDK wrong?
I might be missing a usage-pattern change that landed alongside PR #855 or in subsequent versions. Specifically:
subscribeWithError()directly (in a retry-forever wrapper). Is the recommendation now to usesubscribe()(noWithError) and react tochannel.statusChangeinstead, given the lifecycle is now managed byConnectionManager/ChannelStateManager?postgresChangeAsyncStream consumers before callingsubscribeWithError()(so they're ready when the channel attaches). Is the recommendation now to do this in a different order?handleAppLifecycle(true on iOS),connectOnSubscribe(true),disconnectOnEmptyChannelsAfter(default),timeoutInterval(10 s),heartbeatInterval(25 s),maxRetryAttempts(5). Is there a combination that's expected to be configured explicitly for cold-start reliability now?postgresChangebindings on a non-publicschema (libraryin our case) treated differently in the new state machine?Happy to be told this is a usage problem rather than a regression — but the bisect is the part I can't explain away on the usage hypothesis (v2.44.0 with the same call pattern works in ~3 s).
Steps to Reproduce
SupabaseClientwith defaultRealtimeClientOptions.RealtimeChannelV2:let channel = client.channel("library-realtime").postgresChangestreams on a custom (non-public) schema, e.g.schema: "library", table: "libros"andtable: "libro_files".for awaitconsumers on those streams in detached Tasks.try await channel.subscribeWithError()(optionally wrapped in a retry-forever loop to avoid the SDK's "Maximum retry attempts reached" giving up after ~85 s).client.realtimeV2.heartbeat(AsyncStream) andchannel.statusChangefrom app code.Expected: channel reaches
.subscribedwithin seconds (as it does in v2.44.0).Actual on v2.44.1 → v2.46.0: socket reaches
.connectedin ~2 s, but heartbeats time out every ~50 s (sent→timeout), triggering reconnects that kill in-flight JOINs. The channel stays in.subscribinguntil — by luck after multiple cycles — a JOIN slips through before its next heartbeat-timeout reconnect. Subscribe time observed between ~60 s and ~7 minutes depending on version.Code Sample
Relevant log output