Skip to content

[ECO-5397, WIP] Investigate hangs in integration tests#456

Closed
lawrence-forooghian wants to merge 28 commits intomainfrom
295-2025-10-23-further-investigate-hanging-integration-tests
Closed

[ECO-5397, WIP] Investigate hangs in integration tests#456
lawrence-forooghian wants to merge 28 commits intomainfrom
295-2025-10-23-further-investigate-hanging-integration-tests

Conversation

@lawrence-forooghian
Copy link
Copy Markdown
Collaborator

@lawrence-forooghian lawrence-forooghian commented Oct 23, 2025

This branch runs the iOS tests in a loop to try and get a hang.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Oct 23, 2025

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

✨ Finishing touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch 295-2025-10-23-further-investigate-hanging-integration-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 23, 2025 13:32 Inactive
Base automatically changed from tidy-PaginatedResult-internals to main October 24, 2025 13:15
@lawrence-forooghian
Copy link
Copy Markdown
Collaborator Author

Looking at https://github.com/ably/ably-chat-swift/actions/runs/18750083215/job/53487146062?pr=456 it seems that a room.attach() call is hanging.

  1. look in the logs at 2025-10-23T14:42:26.7078310Z for the room.attach() call
  2. filter subsequent logs by slash-rx:
2025-10-23T14:42:26.7079090Z 2025-10-23 14:42:26.643475+0000 xctest[73535:331898] slash-rx: (AblyChat/RoomLifecycleManager.swift:182) Transitioning from initialized to attaching
2025-10-23T14:42:26.7080150Z 2025-10-23 14:42:26.644688+0000 xctest[73535:331898] DEBUG: slash-rx: (ARTRealtimeChannel.m:582) RT:0x1053529a0 C:0x600003d16b20 (room/with/slash::$chat) channel state transitions from 0 - Initialized to 1 - Attaching
2025-10-23T14:42:26.7081180Z 2025-10-23 14:42:26.644752+0000 xctest[73535:331898] VERBOSE: slash-rx: (ARTEventEmitter.m:330) PublicEventEmitter Notification emitted 0x600002106f30-ARTChannelEventAttaching
2025-10-23T14:42:26.7082240Z 2025-10-23 14:42:26.644790+0000 xctest[73535:331898] VERBOSE: slash-rx: (ARTEventEmitter.m:330) PublicEventEmitter Notification emitted 0x600002106f30
2025-10-23T14:42:26.7083310Z 2025-10-23 14:42:26.644906+0000 xctest[73535:331898] DEBUG: slash-rx: (ARTRealtime.m:1354) RT:0x1053529a0 (channel: room/with/slash::$chat) sending protocol message with action '10 - Attach' was ignored: Error 80010 - Invalid operation (connection state is 1 - Connecting)
2025-10-23T14:42:26.7085860Z 2025-10-23 14:42:26.645456+0000 xctest[73535:331898] slash-rx: (AblyChat/RoomLifecycleManager.swift:195) Got channel state event ChannelStateChange(current: __C.ARTRealtimeChannelState, previous: __C.ARTRealtimeChannelState, event: __C.ARTChannelEvent, reason: nil, resumed: false)
2025-10-23T14:42:26.7087080Z 2025-10-23 14:42:26.645600+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: true
2025-10-23T14:42:26.8841630Z 2025-10-23 14:42:26.835202+0000 xctest[73535:332406] DEBUG: slash-rx: (ARTWebSocketTransport.m:255) R:0x1053529a0 WS:0x6000026059e0 websocket did open
2025-10-23T14:42:27.1629220Z 2025-10-23 14:42:26.849364+0000 xctest[73535:332326] VERBOSE: slash-rx: (ARTWebSocketTransport.m:338) R:0x1053529a0 WS:0x6000026059e0 websocket did receive message
2025-10-23T14:42:27.2972440Z 2025-10-23 14:42:26.853501+0000 xctest[73535:332326] VERBOSE: slash-rx: (ARTWebSocketTransport.m:364) R:0x1053529a0 WS:0x6000026059e0 websocket in Open state did receive data {length = 377, bytes = 0x83a66163 74696f6e d2000000 04ac636f ... 00000000 05265c00 }
2025-10-23T14:42:27.3028360Z 2025-10-23 14:42:26.857816+0000 xctest[73535:332326] DEBUG: slash-rx: (ARTJsonLikeEncoder.m:1135) RS:0x60000330a620 ARTJsonLikeEncoder<msgpack> decoding '{length = 377, bytes = 0x83a66163 74696f6e d2000000 04ac636f ... 00000000 05265c00 }'; got: {
2025-10-23T14:42:27.3043900Z 2025-10-23 14:42:26.859789+0000 xctest[73535:332326] VERBOSE: slash-rx: (ARTJsonLikeEncoder.m:842) RS:0x60000330a620 ARTJsonLikeEncoder<msgpack>: protocolMessageFromDictionary {
2025-10-23T14:42:27.3054700Z 2025-10-23 14:42:26.860705+0000 xctest[73535:332326] VERBOSE: slash-rx: (ARTRealtime.m:1551) R:0x1053529a0 activity
2025-10-23T14:42:27.3057250Z 2025-10-23 14:42:26.862097+0000 xctest[73535:332326] VERBOSE: slash-rx: (ARTRealtime.m:1558) R:0x1053529a0 set idle timer had been ignored
2025-10-23T14:42:27.3058990Z 2025-10-23 14:42:26.866028+0000 xctest[73535:332326] VERBOSE: slash-rx: (ARTRealtime.m:1601) R:0x1053529a0 did receive Protocol Message Connected (connection state is Connecting)
2025-10-23T15:29:10.2412920Z 2025-10-23 15:29:10.189019+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T15:29:10.2861290Z 2025-10-23 15:29:10.202245+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T15:29:12.9026930Z 2025-10-23 15:29:12.865234+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: true
2025-10-23T15:29:12.9422140Z 2025-10-23 15:29:12.869308+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T16:29:27.0276180Z 2025-10-23 16:29:26.974747+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T16:29:29.8616520Z 2025-10-23 16:29:29.811253+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: true
2025-10-23T16:29:29.8644580Z 2025-10-23 16:29:29.814595+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T17:29:44.0282560Z 2025-10-23 17:29:43.945240+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T17:29:47.0058500Z 2025-10-23 17:29:46.934748+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: true
2025-10-23T17:29:47.0516330Z 2025-10-23 17:29:46.942737+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T18:30:01.0556770Z 2025-10-23 18:30:01.024580+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T18:30:03.7820740Z 2025-10-23 18:30:03.746941+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: true
2025-10-23T18:30:03.8074010Z 2025-10-23 18:30:03.749640+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T19:30:17.9039640Z 2025-10-23 19:30:17.853913+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false
2025-10-23T19:30:20.5583880Z 2025-10-23 19:30:20.518192+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: true
2025-10-23T19:30:20.5607670Z 2025-10-23 19:30:20.526222+0000 xctest[73535:331898] INFO: slash-rx: (ARTOSReachability.m:47) Reachability: host sandbox-realtime.ably.io is reachable: false

What's happening is:

  1. we call attach() on the channel whilst connection is still CONNECTING
  2. we receive the CONNECTED ProtocolMessage
  3. ?? does the connection become CONNECTED? logs don't show that it does
  4. but if it does (which, again, not clear if it does) the channel should at that point send the ATTACH and eventually become attached (see RTL4i), but it seems it doesn't even send the ATTACH.

This is presumably a bug in ably-cocoa but I haven't looked into it yet. Worth noting that we do have a test for RTL4i's ATTACHING case.

@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 24, 2025 22:48 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 00:33 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 01:15 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 09:01 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 09:36 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 11:39 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 14:15 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 14:34 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 20:11 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 21:56 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 25, 2025 23:32 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 26, 2025 00:17 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 26, 2025 12:50 Inactive
@maratal
Copy link
Copy Markdown
Collaborator

maratal commented Oct 27, 2025

It's a deadlock in device_nosync when calling to dispatch_sync. In my understanding both instances of ARTRealtime (in two test rooms) trying to call device_nosync and when in their rest queues threads are opposite, this ends up with threadA waiting threadB and vice versa for another instance. Quote from dispatch_sync docstring:

 * Calls to dispatch_sync() targeting the current queue will result
 * in dead-lock. Use of dispatch_sync() is also subject to the same
 * multi-party dead-lock problems that may result from the use of a mutex.
 * Use of dispatch_async() is preferred.

This was excluded from the documentation on the site for some reason.

I asked Claude what other synchronization methods can be used here, and it proposed to use atomic_store or just use cached version of device instance to avoid synchronization at all (which I would prefer doing at this point). wdyt @lawrence-forooghian

UPD: seems like correct description of what's happening - ably/ably-cocoa#2131 (comment)

@lawrence-forooghian
Copy link
Copy Markdown
Collaborator Author

I'm not sure what you intended to link to @maratal; the link you've posted is to a CI log. Please could you share a link to the ably-cocoa code that you believe is responsible for the deadlock?

@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 13:28 Inactive
@lawrence-forooghian lawrence-forooghian force-pushed the 295-2025-10-23-further-investigate-hanging-integration-tests branch from 358fb24 to 70a852a Compare October 27, 2025 17:09
@lawrence-forooghian lawrence-forooghian force-pushed the 295-2025-10-23-further-investigate-hanging-integration-tests branch from 70a852a to d78a11c Compare October 27, 2025 17:17
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 17:18 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 18:12 Inactive
This reverts commit 0ffb874.

(still hanging, will apply that fix plus some logging)
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 18:31 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 18:39 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 19:11 Inactive
prompt: "Update the iOS job in @.github/workflows/check.yaml so that
after 15 minutes it pauses the test process, attaches to it with a
debugger, and prints out a backtrace of all of the threads"
@github-actions github-actions bot temporarily deployed to staging/pull/456/AblyChat October 27, 2025 19:38 Inactive
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants