[BUG] [iOS] CallAgent created after sign-out/sign-in (different user) hangs in Connecting and ends with callEndReason 408/0

**Describe the bug**
After signing out the current user, fully disposing `CallAgent` + `CallClient`, and signing in a different user with a fresh ACS identity in the same iOS process, the second user's `callAgent.join(with: RoomCallLocator)` reaches `CallState.connecting` and stays there for ~90 seconds. The server then ends the call with `callEndReason.code = 408, subcode = 0`.

Per the [official troubleshooting-codes documentation](https://learn.microsoft.com/en-us/azure/communication-services/resources/troubleshooting/voice-video-calling/troubleshooting-codes), 408 with no subcode is: *"Call controller timed out. Call Controller timed out waiting for protocol messages from user endpoints. Ensure clients are connected and available."* — i.e. the ACS Call Controller is waiting for signaling messages from the second `CallAgent` that never arrive.

This strongly suggests SDK-internal state from the first session is wedging the second `CallAgent`'s signaling channel, even though both the Swift `CallAgent` and `CallClient` instances are freshly constructed for User B.

***Exception or Stack Trace***
There is no thrown exception. `callAgent.join` returns no error; `Call.callEndReason` after the 90 s timeout is:

```
call.callEndReason.code    = 408
call.callEndReason.subcode = 0
```

**To Reproduce**
Steps to reproduce the behavior, all in a single iOS process:

1. Launch the app. Sign in as **User A** (ACS identity `A`, ACS token with `voip,chat` scope, valid invitee in some Room `R1`).
2. Construct `CallClient` and `CallAgent` for User A. Call `callAgent.join(with: RoomCallLocator(roomId: R1.id))`.
3. Confirm the call reaches `CallState.connected`. Leave the call (`call.hangUp(...)`).
4. Sign User A out and dispose the ACS stack:
   ```swift
   callAgent.dispose()
   callAgent = nil
   callClient.dispose()
   callClient = nil
   ```
5. Sign in as **User B** (different ACS identity `B`, fresh `voip,chat` token, valid invitee in some Room `R2` — same or different room as `R1`; outcome is identical).
6. Construct a fresh `CallClient` and `CallAgent` for User B (`callAgent` is `nil` at this point). Call `callAgent.join(with: RoomCallLocator(roomId: R2.id))`.

Observed: `join` completion fires with no error in ~1 ms, call reaches `Connecting` in ~100 ms, stays there for 90 s, then `Disconnected` with `callEndReason.code = 408, subcode = 0`.

***Code Snippet***
Minimal shape of the second-user sign-in path:

```swift
// User B path (after User A was fully disposed)
let credential = try CommunicationTokenCredential(token: tokenForUserB)
let client = CallClient()
client.createCallAgent(userCredential: credential) { agent, error in
    guard let agent = agent, error == nil else { return }
    let locator = RoomCallLocator(roomId: roomForUserB)
    let opts = JoinCallOptions()
    agent.join(with: locator, joinCallOptions: opts) { call, error in
        // completion fires ok in ~1ms; `call.state` then goes to .connecting
        // and stays there ~90s before reaching .disconnected with code=408 subcode=0
    }
}
```

**Expected behavior**
User B's `join` reaches `CallState.connected`, the same way User A's did earlier in the same process.

**Screenshots**
N/A — failure is on the signaling layer, no UI artifact.

**Setup (please complete the following information):**
- OS: iOS 18.x (physical device — does not reproduce on Simulator)
- IDE: Xcode 26.3
- Version of the Library used: `AzureCommunicationCalling` **2.18.2** (SwiftPM, latest as of 2026-03-10)

**Additional context**

**Tokens are correct.** We decode the JWT payload on receipt. Tokens for User A and User B have *different* ACS `skypeid` values (as expected), *same* `resourceId`, *same* `voip,chat` scope, are fresh (`issuedAgo` ≤ 1 s) and valid (~24 h). The 408 is not a token issue.

**Selected log excerpts** (single process, two consecutive sessions):

*User A — succeeds:*
```
[ACS-perf] +0ms    connect() called
[ACS-perf] +3ms    CallClient() initialized
[ACS-perf] +454ms  createCallAgent returned (ok)
[ACS-perf] +460ms  callAgent.join returned (ok)
[ACS-perf] +823ms  call.state -> Connecting
[ACS-perf] +4555ms call.state -> Connected
```

*Logout (after User A leaves the call):*
```
[ACS-tearDown] disposing CallAgent/CallClient for session end
[ACS-tearDown] CallAgent/CallClient disposed
```

*User B — fails:*
```
[ACS-perf] +0ms    connect() called
[ACS-perf] +0ms    CallClient() initialized
[ACS-perf] +86ms   createCallAgent returned (ok)
[ACS-perf] +87ms   callAgent.join returned (ok)
[ACS-perf] +107ms  call.state -> Connecting
... 90 seconds of silence ...
[ACS-perf] +91206ms call.state -> Disconnected
call.callEndReason.code = 408, subcode = 0
```

Note: `createCallAgent` returns in ~90 ms for User B vs ~450 ms for User A. The ~5× speedup with otherwise-identical setup strongly suggests reused SDK-internal infrastructure that survives `dispose()`.

**What we ruled out using the published [code/subcode catalog](https://learn.microsoft.com/en-us/azure/communication-services/resources/troubleshooting/voice-video-calling/troubleshooting-codes):**

| Code/Subcode | Meaning | Observed? |
|---|---|---|
| 403 / 5828 | "Join isn't authorized — user isn't part of invitee list" | **No** — both users are valid invitees |
| 403 / 5829 | "Beyond end time or before start time" | No |
| 403 / 5830 | "Only ACS user can join the Rooms meeting" | No |
| 495 / 4507 | "Invalid ACS token" | No — token decoded and verified valid |
| 410 / 3112 | "Local media stack or ICE checks failed" | No — not a media/firewall issue |
| 408 / 10057 | Rooms-specific "callee failed to finalize call setup" | No |
| **408 / 0** | **Generic "Call Controller timed out waiting for protocol messages"** | **Yes** ← this is us |

The fact that we get 408/0 rather than the Rooms-specific 408/10057 is significant: ACS isn't classifying us as "the participant disappeared mid-join" — it's saying the second `CallAgent` isn't driving the signaling protocol on its side.

**What we tried in app code:**

1. `agent.dispose()` then `client.dispose()` on a background queue on logout — required, but on its own the second user's `callAgent.join` completion *never fires*; the issue surfaces as a silent hang.
2. Setting `callAgent = nil`, `callClient = nil`, clearing `agent.delegate` and any singleton `CallAgentDelegate` owner before dispose — required, no behavior change beyond #1.
3. Replacing our app-side singleton holding the `CallClient`/`CallAgent` with a brand-new instance after dispose. **This changed the symptom** from "join callback never returns" to "join callback returns ok in 1 ms, call reaches Connecting, server times out with 408/0 at 90 s" — i.e. it moved the failure from somewhere fully inside the SDK to a now-observable, server-acknowledged signaling stall.
4. Waiting 30+ s of wall-clock between dispose and the next `connect()` (MSAL interactive sign-in time, in practice) — does not help. The wedge survives wall-clock time.

**What we couldn't do, but would help:**

- Inspect `CallAgent.connectionStatus`. This property exists on the Android and JavaScript SDKs and is referenced in the [Manage calls](https://learn.microsoft.com/en-us/azure/communication-services/how-tos/calling-sdk/manage-calls) documentation as the way to detect a `Disconnected` agent that should be re-created. It is **not exposed on the iOS SDK** (verified against the public Swift interface and the framework's Obj-C symbol table for 2.18.2). On iOS we have no API to ask "is this CallAgent healthy" before calling `join`.

**What we did not try:**
- `agent.unregisterPushNotification` — we don't use VoIP push at all, no `PKPushRegistry`, no CallKit. Including this for completeness.
- Restarting the app process. We know this would work as a workaround, but it's not acceptable mid-incident for our use case.

**Related existing issues:**
- [#2114 — VoIP Push Still Arrives After Unregistering Push Notifications](https://github.com/Azure/azure-sdk-for-ios/issues/2114) — different surface (push), same theme: per-user state survives `dispose`.
- [Azure/Communication#339 — `dispose` takes too long to execute](https://github.com/Azure/Communication/issues/339) — touches on `dispose` cleanup behavior.
- [StackOverflow 79626300](https://stackoverflow.com/questions/79626300) — accepted answer explicitly notes that *"the CallAgent is not fully released after signing out"*.

**Asks:**
1. Confirmation of whether this is a known wedge in 2.18.2.
2. A documented procedure to fully reset the ACS Calling stack inside a single iOS process so that the second `createCallAgent` produces a fully functional agent.
3. `CallAgent.connectionStatus` (or equivalent) exposed on iOS so apps can detect a wedged agent before calling `join` and avoid showing the user a 90 s spinner that resolves to a 408.

Happy to share full client logs and a `.blog` capture privately if useful.

**Information Checklist**
- [x] Bug Description Added
- [x] Repro Steps Added
- [x] Setup information Added

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[BUG] [iOS] CallAgent created after sign-out/sign-in (different user) hangs in Connecting and ends with callEndReason 408/0 #2490

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Code/Subcode	Meaning	Observed?
403 / 5828	"Join isn't authorized — user isn't part of invitee list"	No — both users are valid invitees
403 / 5829	"Beyond end time or before start time"	No
403 / 5830	"Only ACS user can join the Rooms meeting"	No
495 / 4507	"Invalid ACS token"	No — token decoded and verified valid
410 / 3112	"Local media stack or ICE checks failed"	No — not a media/firewall issue
408 / 10057	Rooms-specific "callee failed to finalize call setup"	No
408 / 0	Generic "Call Controller timed out waiting for protocol messages"	Yes ← this is us

Uh oh!

[BUG] [iOS] CallAgent created after sign-out/sign-in (different user) hangs in Connecting and ends with callEndReason 408/0 #2490

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions