Skip to content

Resolve typed-RPC promises with server error envelopes (and clean up collatedPromises)#55

Open
jack-bot-slims wants to merge 1 commit into
heroiclabs:masterfrom
jack-bot-slims:fix/typed-rpc-error-promises
Open

Resolve typed-RPC promises with server error envelopes (and clean up collatedPromises)#55
jack-bot-slims wants to merge 1 commit into
heroiclabs:masterfrom
jack-bot-slims:fix/typed-rpc-error-promises

Conversation

@jack-bot-slims
Copy link
Copy Markdown

@jack-bot-slims jack-bot-slims commented Apr 27, 2026

Bug

When a server-side RPC throws and Nakama emits an Envelope.error carrying the request cid, the receive loop in Sources/Nakama/Socket.swift only attempts two casts on the matching collatedPromises[cid]:

if let promise = collatedPromise as? EventLoopPromise<Any> { ... }
else if let promise = collatedPromise as? EventLoopPromise<Google_Protobuf_Empty> { ... }

Typed-RPC promises are registered by the generic send<T: Message>(env:) as EventLoopPromise<T> for the concrete T (e.g. Nakama_Api_Rpc, Nakama_Realtime_Channel, …). Neither cast matches, so the error is silently dropped and try await promise.futureResult.get() blocks indefinitely.

This affects every typed RPC path — rpc, joinChat, createMatch, addMatchmaker, followUsers, createParty, addMatchmakerParty, listPartyJoinRequests, etc. Void-returning RPCs accidentally work via the Google_Protobuf_Empty cast.

Real-world reproduction

A server-side SQL bug made one of our RPCs return an error envelope in ~2 ms. The Swift client never observed any response — the awaiting task hung. Downstream we held a request-serialising mutex, which then stayed locked until heartbeat-driven force-release fired ~26 s later. After patching this branch locally, the same server error surfaces immediately as the expected NakamaRealtimeError.

Wire-protocol confirmation

Server-side pipeline_rpc.go sets Cid: envelope.Cid on error envelopes (Proto Envelope.cid is string cid = 1, populated on all responses). No protocol ambiguity — purely an SDK-side gap.

Fix

Extend the .error branch with explicit casts for every concrete promise type registered by send<T>(env:):

  • Nakama_Api_Rpc
  • Nakama_Realtime_Channel
  • Nakama_Realtime_ChannelMessageAck
  • Nakama_Realtime_Match
  • Nakama_Realtime_MatchmakerTicket
  • Nakama_Realtime_Status
  • Nakama_Realtime_Party
  • Nakama_Realtime_PartyMatchmakerTicket

…in addition to the existing Any and Google_Protobuf_Empty casts. If a future return type is added to Socket.swift without updating this branch, an error is now logged instead of the failure being silently swallowed.

Bonus: clean up collatedPromises

The same code path previously never removed entries from collatedPromises — neither on success nor on error. In long-running socket sessions this slowly leaks memory. Switched the lookup to removeValue(forKey:) so the entry is dropped before dispatching to the appropriate promise. Mirrors delete this.cIds[message.cid] in nakama-js.

Notes for reviewers

  • The existing success switch uses as! force-casts and a default branch that falls back to Empty.succeed. Left both untouched to keep the diff minimal and focused on the error path. Happy to align them in a follow-up if preferred.
  • Considered an alternative refactor that changes collatedPromises to store an erased (rawPromise, failer: (Error) -> Void) tuple to avoid the cast list entirely. It is structurally cleaner but a bigger change; went with the additive cast list here for review safety. Glad to redo as the tuple approach if maintainers prefer.
  • Diff verified locally; please run the project's normal build/test suite before merging — I do not have a Swift toolchain set up to validate it from this branch.

Test plan

  • Maintainer integration test: trigger a server-side RPC error and confirm the awaiting Swift caller observes a NakamaRealtimeError instead of hanging.
  • Confirm collatedPromises is empty after a sequence of successful + failing RPCs on a long-lived socket.

…collatedPromises)

When a server emits an `Envelope.error` with a request `cid`, the receive loop
only attempted to cast the matching promise to `EventLoopPromise<Any>` or
`EventLoopPromise<Google_Protobuf_Empty>`. Typed RPC promises (registered as
`EventLoopPromise<Nakama_Api_Rpc>`, `Nakama_Realtime_Channel`, etc.) matched
neither, so the error was silently dropped and `await promise.futureResult.get()`
hung forever. Affected callers include `rpc`, `joinChat`, `createMatch`,
`addMatchmaker`, `followUsers`, `createParty`, `addMatchmakerParty`, and every
other typed RPC. Void-returning RPCs accidentally worked via the `Empty` cast.

Fix: extend the `.error` branch with explicit casts for every concrete promise
type registered by `send<T>(env:)`, falling back to a logged error if a future
type is added without updating this branch.

Bonus: replace the `collatedPromises[response.cid]` lookup with
`removeValue(forKey:)` so the promise is dropped from the dictionary in both
success and error paths. Previously entries were never cleaned up — a slow
leak in long-running sessions. Mirrors `delete this.cIds[message.cid]` in
nakama-js.

Co-Authored-By: Hendrik Schäfer <hendrik@musa.guide>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Apr 27, 2026

CLA assistant check
All committers have signed the CLA.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants