You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
feat(room-service,room-worker): durable federation relay for cross-site events
Six room-service request/reply handlers (role_updated, mute/favorite
toggled, subscription_read, thread_read, room_restricted) federated
cross-site events by publishing an InboxEvent inline straight to a remote
site's INBOX across a supercluster gateway. On failure the error returned
to the client *after* the local Mongo write committed, so local and remote
diverged with no durable retry.
Replace this with a durable "federation relay": each handler keeps its
synchronous Mongo write and reply but publishes one RoomFederationEvent to
the local ROOMS stream; room-worker forwards each wrapped InboxEvent to the
destination INBOX with at-least-once retry — the source stream is the
outbox. The producer publish is local-cluster only, so a remote outage can
never block the user's RPC, and a destination-site outage delays the event
(retry-forever with escalating backoff) rather than dropping it.
- pkg/model: RoomFederationEvent + FederationTarget envelope types.
- pkg/subject: RoomCanonicalFederation builder
(chat.room.canonical.{siteID}.federation).
- room-service: federate + buildFederationTarget helpers; six handlers
converted. Wire format is byte-identical to the prior direct publishes,
so inbox-worker is unchanged.
- room-worker: processFederation forwards each target (transient error ->
Nak/redeliver, malformed -> Ack-poison), validating destSiteID/eventType/
envelope/dedupId at the boundary, each attempt bounded by a 3s fail-fast
timeout. It runs on a dedicated durable consumer + worker pool, isolated
from the membership consumer (filtered to create/member.add/member.remove/
room.rename), so an unreachable destination backs up only the federation
lane, never local membership processing. The federation lane retries a
failed forward forever with escalating backoff (5s -> 5m, MaxDeliver=-1),
so a long destination outage delays — never drops — the event. Fails fast
on non-positive MAX_WORKERS.
- docs/client-api.md: cross-site federation note for all six RPCs.
- Tests: forwarder, the two consumer configs, all six handlers (relay
envelope + byte-identical wrapped InboxEvent), a model round-trip, and an
end-to-end JetStream integration round-trip.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Claude-Session: https://claude.ai/code/session_01WcNmcyHTmyokFh9vYm3brj
Copy file name to clipboardExpand all lines: docs/client-api.md
+9-4Lines changed: 9 additions & 4 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1107,6 +1107,8 @@ See [Error envelope](#6-error-envelope-reference). Returned synchronously when v
1107
1107
1108
1108
**`chat.user.{targetAccount}.event.subscription.update`** — emitted once for the user whose role changed, `action: "role_updated"`. Delivered to the target user only (not the requester, not other members). See the [subscription.update schema](#subscriptionupdate-event); the embedded `Subscription` reflects the updated `roles`. No `AsyncJobResult` and no room-key event fire for role updates.
1109
1109
1110
+
**Cross-site federation:** when the target user's home site differs from the room's site, `room-service` emits a `RoomFederationEvent` on the ROOMS stream and `room-worker` forwards the cross-site `role_updated` event (at-least-once) to `chat.inbox.{userSite}.external.role_updated`, where `inbox-worker` applies the updated `roles` to the local `Subscription` (guarded by `rolesUpdatedAt`).
1111
+
1110
1112
```json
1111
1113
{
1112
1114
"userId": "01970a4f8c2d7c9a01970a4f8c2d7c9a",
@@ -1234,7 +1236,7 @@ When the synchronous reply is an error envelope, the request was rejected before
1234
1236
> -`externalAccess` — whether the room is reachable from outside the company network (e.g. internet-side / off-VPN clients). This is a network-access gate, NOT a cross-site federation flag
1235
1237
> -`ownerAccount` — required on the unrestricted-to-restricted transition
1236
1238
>
1237
-
> room-service does the Mongo writes, fans out an `InboxRoomRestricted` event per remote federated site (published to `chat.inbox.{remoteSiteID}.external.room_restricted`), and replies `{"status":"ok","requestId":"…"}` once the work is committed. No `AsyncJobResult` is emitted — the reply *is* the result.
1239
+
> room-service does the Mongo writes, emits a single `RoomFederationEvent` on the ROOMS stream (one target per remote federated site), and replies `{"status":"ok","requestId":"…"}` once the work is committed. `room-worker` forwards the cross-site `room_restricted` event (at-least-once) to each remote site's `chat.inbox.{remoteSiteID}.external.room_restricted`. No `AsyncJobResult` is emitted — the reply *is* the result.
1238
1240
>
1239
1241
> Clients learn about the change via a **`RoomRestrictedRoomEvent`** (`type: "room_restricted"`) on the same `chat.room.{roomID}.event` stream they already subscribe to for chat messages. Like `RoomRenamedRoomEvent`, it's a flat struct with no zero-valued envelope fields:
1240
1242
>
@@ -1547,6 +1549,8 @@ See [Error envelope](#6-error-envelope-reference). Common errors:
1547
1549
}
1548
1550
```
1549
1551
1552
+
**3. Cross-site federation** — when the reader's home site differs from the room's site, `room-service` emits a `RoomFederationEvent` on the ROOMS stream and `room-worker` forwards the cross-site `subscription_read` event (at-least-once) to `chat.inbox.{userSite}.external.subscription_read`, where `inbox-worker` applies `lastSeenAt`/`alert` to the local `Subscription` (guarded by `lastSeenAt`).
1553
+
1550
1554
##### Triggered events — error path
1551
1555
1552
1556
`None — error returned only via the reply subject.`
@@ -1558,7 +1562,7 @@ See [Error envelope](#6-error-envelope-reference). Common errors:
A **synchronous RPC** that clears a single thread's unread state for the caller. `room-service` validates room membership and thread-subscription existence, removes the threadId from the user's `Subscription.ThreadUnread`, recomputes the per-subscription `alert` flag, refreshes the `ThreadSubscription` (`lastSeenAt`, `updatedAt`, `hasMention=false`), and — for cross-site users — publishes a `thread_read` event directly to the user's home-site INBOX so the destination `inbox-worker` can mirror both updates.
1565
+
A **synchronous RPC** that clears a single thread's unread state for the caller. `room-service` validates room membership and thread-subscription existence, removes the threadId from the user's `Subscription.ThreadUnread`, recomputes the per-subscription `alert` flag, refreshes the `ThreadSubscription` (`lastSeenAt`, `updatedAt`, `hasMention=false`), and — for cross-site users — emits a `RoomFederationEvent` on the ROOMS stream; `room-worker` forwards the cross-site `thread_read` event to the user's homesite (at-least-once) so the destination `inbox-worker` can mirror both updates.
1562
1566
1563
1567
##### Request body
1564
1568
@@ -1593,7 +1597,7 @@ See [Error envelope](#6-error-envelope-reference). Common errors:
1593
1597
1594
1598
-**Alert recomputation:**`alert = oldSub.alert && len(newThreadUnread) > 0`. A thread-read can only clear an alert, never set one. When the post-removal `threadUnread` is empty, `alert` becomes false. This computation runs atomically inside the MongoDB aggregation pipeline on the handler's site — not derived client-side.
1595
1599
-**Concurrent local writes:** the room-`Subscription` update and the `ThreadSubscription` update run in parallel inside an `errgroup`. Both must succeed before the handler proceeds.
1596
-
-**Cross-site federation:** if the user's home site differs from the handler's site, a `thread_read` event is published directly to `chat.inbox.{userSite}.external.thread_read` with payload `{account, roomId, threadRoomId, parentMessageId, newThreadUnread, alert, lastSeenAt, timestamp}` (timestamps as `int64` UnixMilli). The destination `inbox-worker` applies the supplied `newThreadUnread`+`alert` to the local Subscription cache and applies `lastSeenAt`+`updatedAt`+`hasMention=false` to the local ThreadSubscription with an `$lt` order-safety guard so out-of-order delivery cannot regress the thread's read position.
1600
+
-**Cross-site federation:** if the user's home site differs from the handler's site, the handler emits a `RoomFederationEvent` on the ROOMS stream and `room-worker` forwards the cross-site `thread_read` event (at-least-once) to `chat.inbox.{userSite}.external.thread_read` with payload `{account, roomId, threadRoomId, parentMessageId, newThreadUnread, alert, lastSeenAt, timestamp}` (timestamps as `int64` UnixMilli). The destination `inbox-worker` applies the supplied `newThreadUnread`+`alert` to the local Subscription cache and applies `lastSeenAt`+`updatedAt`+`hasMention=false` to the local ThreadSubscription with an `$lt` order-safety guard so out-of-order delivery cannot regress the thread's read position.
1597
1601
-**Defensive `roomId` filter:** the thread-subscription lookup additionally enforces that the supplied `threadId` belongs to the room named in the subject. Mismatches return `thread subscription not found` (rather than silently clearing an unrelated thread).
1598
1602
-**Thread-room read-floor recompute:** after both writes succeed, `room-service` recomputes `thread_rooms.minUserLastSeenAt` = `MIN(lastSeenAt)` across all `thread_subscriptions` for the thread room. The floor is set only when every subscriber has a usable `lastSeenAt`; otherwise it is cleared. The recompute is best-effort — a failure is logged but does not fail the RPC. The stored value is also available via [Get Thread Messages](#get-thread-messages).
1599
1603
-**Read-floor fan-out:** when (and only when) the recompute above changes `thread_rooms.minUserLastSeenAt`, the server publishes a `thread_message_read` event (routed by the **parent** room's type) carrying the new floor, so peers can advance thread read-receipt UI live. Best-effort (a publish failure does not fail the RPC); never fires when the floor is unchanged or the thread room is missing.
@@ -1680,6 +1684,7 @@ See [Error envelope](#6-error-envelope-reference). Common errors:
1680
1684
##### Behaviour notes
1681
1685
1682
1686
-**Notification delivery:**`notification-worker` respects `muted` flags when deciding whether to send mobile push notifications (see [Notification fan-out](#notification-fan-out-mobile-push-only) below).
1687
+
-**Cross-site federation:** when the requester's home site differs from the room's site, `room-service` emits a `RoomFederationEvent` on the ROOMS stream and `room-worker` forwards the cross-site `subscription_mute_toggled` event (at-least-once) to `chat.inbox.{userSite}.external.subscription_mute_toggled`, where `inbox-worker` applies `muted` to the local `Subscription` (guarded by `muteUpdatedAt`).
1683
1688
1684
1689
---
1685
1690
@@ -1729,7 +1734,7 @@ See [Error envelope](#6-error-envelope-reference). Common errors:
1729
1734
1730
1735
##### Cross-site behaviour
1731
1736
1732
-
When the requester's home site differs from the room's site, `room-service`additionally publishes a `subscription_favorite_toggled`InboxEvent directly to `chat.inbox.{userSite}.external.subscription_favorite_toggled`. `inbox-worker` on the user's home site mirrors the flip onto the local `Subscription` document. Missing-subscription on the home site (e.g., a federation race) is a silent no-op — no NACK, no redelivery loop.
1737
+
When the requester's home site differs from the room's site, `room-service`emits a `RoomFederationEvent` on the ROOMS stream and `room-worker` forwards the cross-site `subscription_favorite_toggled`event (at-least-once) to `chat.inbox.{userSite}.external.subscription_favorite_toggled`. `inbox-worker` on the user's home site mirrors the flip onto the local `Subscription` document. Missing-subscription on the home site (e.g., a federation race) is a silent no-op — no NACK, no redelivery loop.
0 commit comments