[Bug]: Realtime: first channel subscribe stalls for 50s–7min since v2.44.1 (PR #855)

### Version

2.46.0

### Platform

iOS

### Swift Version

6.2

### What happened?

`channel.subscribeWithError()` stalls for tens of seconds (often minutes) on the first JOIN of a fresh socket, since v2.44.1. A clean bisect points to PR #855 (`fix(realtime): extract ConnectionManager actor and fix connection lifecycle races`, commit `dcbc63d5`) as the introducing change. Only one realtime commit lands between v2.44.0 and v2.44.1 — PR #855.

| Version | Subscribe time on cold connect |
|---|---|
| v2.43.1 | ~2.3 s ✅ |
| v2.44.0 | ~3.1 s ✅ |
| v2.44.1 | ~60 s ❌ |
| v2.45.0 | minutes ❌ |
| v2.46.0 | ~4–7 min ❌ |

The clearest signal in the trace: **heartbeats never ack until at least one channel reaches `.subscribed`.** Before subscribe wins, every heartbeat goes `sent` → `timeout`. After subscribe wins, every heartbeat goes `sent` → `ok` instantly. This holds across multiple sessions.

### Hypothesis (not verified beyond the bisect)

PR #855's stated goal includes "Lost phx_join reply after connect: WebSocket.events lazily installed onEvent only when the stream was first iterated. Reading `conn.events` synchronously before spawning the message task ensures replies aren't dropped."

Empirically the change appears to drop early server-to-client frames (both heartbeat `phx_reply`s and the `phx_join` reply) on the first socket — channel stays in `.subscribing`, `pendingHeartbeatRef` is never cleared, the heartbeat-timeout → `connectionManager.handleError` path recycles the socket every ~50 s, and the in-flight JOIN is killed each cycle. Eventually one cycle's JOIN slips through before its 50 s window expires.

This is a hypothesis from reading the v2.44.0 → v2.44.1 diff — the bisect is the load-bearing evidence.

### Server-side context

Project's `realtime` logs show no tenant init/terminate events overlapping the failing subscribe windows; tenant is healthy and serves heartbeat acks instantly once subscribe wins.

REST queries on the same network are fast and reliable throughout.

### Workaround

Pin to `2.44.0`. Subscribes in ~3 s; no further changes needed.

### Possibly using the SDK wrong?

I might be missing a usage-pattern change that landed alongside PR #855 or in subsequent versions. Specifically:

- I'm calling `subscribeWithError()` directly (in a retry-forever wrapper). Is the recommendation now to use `subscribe()` (no `WithError`) and react to `channel.statusChange` instead, given the lifecycle is now managed by `ConnectionManager` / `ChannelStateManager`?
- I start the `postgresChange` AsyncStream consumers *before* calling `subscribeWithError()` (so they're ready when the channel attaches). Is the recommendation now to do this in a different order?
- I use SDK defaults for `handleAppLifecycle` (true on iOS), `connectOnSubscribe` (true), `disconnectOnEmptyChannelsAfter` (default), `timeoutInterval` (10 s), `heartbeatInterval` (25 s), `maxRetryAttempts` (5). Is there a combination that's expected to be configured explicitly for cold-start reliability now?
- Are `postgresChange` bindings on a non-`public` schema (`library` in our case) treated differently in the new state machine?

Happy to be told this is a usage problem rather than a regression — but the bisect is the part I can't explain away on the usage hypothesis (v2.44.0 with the same call pattern works in ~3 s).

### Steps to Reproduce

1. Create a `SupabaseClient` with default `RealtimeClientOptions`.
2. Open one `RealtimeChannelV2`: `let channel = client.channel("library-realtime")`.
3. Bind two `postgresChange` streams on a custom (non-`public`) schema, e.g. `schema: "library", table: "libros"` and `table: "libro_files"`.
4. Start two `for await` consumers on those streams in detached Tasks.
5. Call `try await channel.subscribeWithError()` (optionally wrapped in a retry-forever loop to avoid the SDK's "Maximum retry attempts reached" giving up after ~85 s).
6. Observe `client.realtimeV2.heartbeat` (AsyncStream<HeartbeatStatus>) and `channel.statusChange` from app code.

Expected: channel reaches `.subscribed` within seconds (as it does in v2.44.0).

Actual on v2.44.1 → v2.46.0: socket reaches `.connected` in ~2 s, but heartbeats time out every ~50 s (`sent` → `timeout`), triggering reconnects that kill in-flight JOINs. The channel stays in `.subscribing` until — by luck after multiple cycles — a JOIN slips through before its next heartbeat-timeout reconnect. Subscribe time observed between ~60 s and ~7 minutes depending on version.

### Code Sample

```swift
let channel = client.channel("library-realtime")
let libroStream = channel.postgresChange(AnyAction.self, schema: "library", table: "libros")
let fileStream  = channel.postgresChange(AnyAction.self, schema: "library", table: "libro_files")

// Consumers started before subscribe so they're ready when the channel attaches.
Task { for await c in libroStream { handle(c) } }
Task { for await c in fileStream  { handle(c) } }

// Retry-forever wrapper because subscribeWithError's internal max-retries
// (5 × ~10 s timeoutInterval) gives up before the first JOIN can land
// on broken versions.
Task {
    while !Task.isCancelled {
        do {
            try await channel.subscribeWithError()
            return
        } catch {
            try? await Task.sleep(for: .seconds(5))
        }
    }
}
```

### Relevant log output

```shell
T=0.0s   subscribe begin; channel = .subscribing; socket = .disconnected
T=2.1s   socket = .connected
T=26.6s  heartbeat sent          ← first heartbeat
T=51.1s  heartbeat timeout       ← second heartbeat fires, finds previous
                                   pendingHeartbeatRef set → timeout signalled
T=51.9s  socket = .connecting    ← SDK reconnects in response to heartbeat timeout
T=60.2s  socket = .connected
T=70.1s  subscribeWithError throws "Maximum retry attempts reached"
         (5 internal retries × ~10 s timeoutInterval exhausted inside the
         50 s window between socket-connect and heartbeat-timeout-reconnect)
... outer retry loop fires; pattern repeats ...
T=410.3s channel = .subscribed   ← eventually succeeds
T=410.3s+ heartbeats now ack instantly (`sent` → `ok`, same millisecond)
```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Bug]: Realtime: first channel subscribe stalls for 50s–7min since v2.44.1 (PR #855) #999

Version

Platform

Swift Version

What happened?

Hypothesis (not verified beyond the bisect)

Server-side context

Workaround

Possibly using the SDK wrong?

Steps to Reproduce

Code Sample

Relevant log output

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Version	Subscribe time on cold connect
v2.43.1	~2.3 s ✅
v2.44.0	~3.1 s ✅
v2.44.1	~60 s ❌
v2.45.0	minutes ❌
v2.46.0	~4–7 min ❌

Uh oh!

[Bug]: Realtime: first channel subscribe stalls for 50s–7min since v2.44.1 (PR #855) #999

Description

Version

Platform

Swift Version

What happened?

Hypothesis (not verified beyond the bisect)

Server-side context

Workaround

Possibly using the SDK wrong?

Steps to Reproduce

Code Sample

Relevant log output

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions