Skip to content

[AIT-280] Apply operations on ACK#2155

Merged
lawrence-forooghian merged 3 commits intomainfrom
AIT-280-apply-on-ACK
Feb 20, 2026
Merged

[AIT-280] Apply operations on ACK#2155
lawrence-forooghian merged 3 commits intomainfrom
AIT-280-apply-on-ACK

Conversation

@lawrence-forooghian
Copy link
Copy Markdown
Collaborator

@lawrence-forooghian lawrence-forooghian commented Jan 26, 2026

Note: This PR is based on top of #2167; please review that one first.

Based on ably/specification#419 at d809334. Implementation and tests are Claude-generated from the spec; I've reviewed them and given plenty of feedback, but largely resisted the temptation to tweak things that aren't quite how I'd write them but which are still correct.

The only behaviour here that's not in the spec is to also apply-on-ACK for batch operations (the batch API isn't in the spec yet).

Summary of decisions re modifications to existing tests (written by Claude):

  • Removed redundant waitFor* calls after SDK operations (map.set(), counter.increment(), etc.) - with apply-on-ACK, values are available immediately after the operation promise resolves
  • Kept waitFor* calls after REST operations (objectsHelper.operationRequest(), objectsHelper.createAndSetOnMap()) - these still require waiting for the echo to arrive over Realtime
  • Added explanatory comment to applyOperationsScenarios noting that those tests cover operations received over Realtime (via REST), and pointing to the new "Apply on ACK" section for tests of locally-applied operations

Docs PR: ably/docs#3161

Summary by CodeRabbit

  • New Features

    • Operations now apply locally once the server ACKs them, improving local consistency
  • Improvements

    • Publish, state and presence calls now surface explicit ACK results
    • More deterministic error handling for message delivery and attach/detach flows
    • Internal operation sources tracked to avoid double-apply and better serial handling
  • Documentation

    • Clarified batching semantics and ACK/apply timing
  • Tests

    • Added interceptors and extensive tests for apply-on-ACK, echo/ACK sequencing, and sync scenarios

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Jan 26, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

Walkthrough

Implements apply-on-ACK for LiveObjects: adds ObjectsOperationSource, publishAndApply, ACK-tracking and buffering during OBJECT_SYNC, changes RealtimeChannel to await publish ACKs, updates LiveMap/LiveCounter applyOperation signatures, and updates batch flush to use publishAndApply. Tests and test helpers added for ACK/echo sequencing.

Changes

Cohort / File(s) Summary
Realtime channel & publish ACKs
src/common/lib/client/realtimechannel.ts
Adds sendAndAwaitAck(msg): Promise<API.PublishResult>; sendMessage now returns void. Message-sending flows (publish, sendState, sendPresence, attach/detach paths) updated to propagate publish ACK results instead of swallowing them.
LiveObjects core & operation source
src/plugins/liveobjects/realtimeobject.ts
Adds ObjectsOperationSource enum, _appliedOnAckSerials, _pendingOperations; publish now returns API.PublishResult; new publishAndApply(objectMessages) which awaits ACK, synthesizes serial/siteCode messages, waits for sync if needed, and applies locally. Object message application now takes a source and skips already-applied serials; buffered ACK-applies are applied at endSync.
LiveObject / LiveMap / LiveCounter
src/plugins/liveobjects/liveobject.ts, src/plugins/liveobjects/livemap.ts, src/plugins/liveobjects/livecounter.ts
Abstract and concrete applyOperation signatures now accept source: ObjectsOperationSource and return boolean (true if applied, false if skipped/tombstoned); _applyOperation/_throwNoPayloadError updated to return/throw never where appropriate. set/remove/increment now use publishAndApply.
Batching path & typings docs
src/plugins/liveobjects/rootbatchcontext.ts, liveobjects.d.ts
Root batch flush now uses publishAndApply; JSDoc for batch interfaces updated to state batched operations are sent to Ably on batch completion and applied locally on ACK. No type signature changes in public typings aside from docs.
Tests & test helpers
test/realtime/liveobjects.test.js, test/common/modules/private_api_recorder.js
Adds createEchoInterceptor and createAckInterceptor helpers; new "Apply on ACK" test suite covering ACK/echo/sync interactions and subscription events; expands private API allowlist with read.ProtocolMessage.state.

Sequence Diagram(s)

sequenceDiagram
    participant Client
    participant RTC as RealtimeChannel
    participant LO as LiveObjects
    participant Server

    Client->>RTC: publish(objectMessages)
    RTC->>RTC: sendAndAwaitAck(msg)
    RTC->>Server: transport send

    alt Server ACKs
        Server->>RTC: ACK
        RTC->>LO: publishAndApply(syntheticMsg, source=local)
        LO->>LO: applyOperation(source=local)
        LO-->>Client: emit local state/events
    end

    alt Server Echo arrives
        Server->>RTC: echo OBJECT messages
        RTC->>LO: _applyObjectMessages(source=channel)
        LO->>LO: skip if serial already applied
    end

    note over LO: If OBJECT_SYNC in progress, ACK-applies are buffered until endSync
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~70 minutes

Possibly related PRs

Suggested reviewers

  • owenpearson
  • mschristensen

Poem

🐰 I hop on ACKs with joyful cheer,

I wait for confirmation, then apply what's clear.
Echoes arrive but I won't repeat,
Buffered through sync to keep the beat.
LiveObjects hum — a tidy, synced feat.

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title 'AIT-280 Apply operations on ACK' directly and clearly describes the main change: implementing apply-on-ACK behavior for LiveObjects operations.
Linked Issues check ✅ Passed The PR implements apply-on-ACK for LiveObjects operations as required by AIT-280, including changes to RealtimeObject, LiveCounter, LiveMap, and batch context classes.
Out of Scope Changes check ✅ Passed All changes are directly related to implementing apply-on-ACK: new ObjectsOperationSource enum, publishAndApply method, applyOperation signature updates, and test coverage for the new feature.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch AIT-280-apply-on-ACK

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@lawrence-forooghian lawrence-forooghian changed the base branch from main to AIT-318-remove-createOperationIsMerged January 26, 2026 18:07
@github-actions github-actions bot temporarily deployed to staging/pull/2155/bundle-report January 26, 2026 18:08 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/features January 26, 2026 18:08 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/typedoc January 26, 2026 18:08 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/bundle-report January 26, 2026 18:46 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/features January 26, 2026 18:46 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/typedoc January 26, 2026 18:46 Inactive
@lawrence-forooghian lawrence-forooghian force-pushed the AIT-280-apply-on-ACK branch 2 times, most recently from a4bed5f to 37bd08d Compare January 26, 2026 18:55
@github-actions github-actions bot temporarily deployed to staging/pull/2155/bundle-report January 26, 2026 18:56 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/features January 26, 2026 18:56 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/typedoc January 26, 2026 18:56 Inactive
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts
Comment thread src/common/lib/client/realtimechannel.ts Outdated
Comment thread src/common/lib/client/realtimechannel.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/livecounter.ts
Comment thread src/plugins/liveobjects/livecounter.ts Outdated
@github-actions github-actions bot temporarily deployed to staging/pull/2155/bundle-report January 27, 2026 13:53 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/features January 27, 2026 13:53 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/typedoc January 27, 2026 13:53 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/features January 30, 2026 17:05 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/bundle-report January 30, 2026 17:05 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/typedoc January 30, 2026 17:05 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/features January 30, 2026 17:06 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/bundle-report January 30, 2026 17:07 Inactive
@github-actions github-actions bot temporarily deployed to staging/pull/2155/typedoc January 30, 2026 17:07 Inactive
@lawrence-forooghian lawrence-forooghian changed the base branch from AIT-318-remove-createOperationIsMerged to main January 30, 2026 17:09
Copy link
Copy Markdown
Contributor

@VeskeR VeskeR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments based on our call yesterday and the spec review. Still need to review generic code changes but I wouldn't expect any drastic changes there as conceptually the spec looks good and the implementation seems to be following the spec.

Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
`buffering ${syntheticMessages.length} message(s) until sync completes; channel=${this._channel.name}`,
);
await new Promise<void>((resolve) => {
this._bufferedAcks.push({ objectMessages: syntheticMessages, signal: resolve }); // RTO20e1
Copy link
Copy Markdown
Contributor

@VeskeR VeskeR Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this on a call, posting here a summary and required actions:

This call essentially waits for the channel's LiveObjects state to become SYNCED. (Note: as discussed in ably/specification#419 (comment), this will be rewritten to simply await a SYNCED event instead of dealing with _bufferedAcks directly, but the underlying issue remains the same.)

The problem: if during publishAndApply we experience a connection loss in such a way that publish still succeeds (it was able to send or retry the operation and receive an ACK), but as a result we enter the SYNCING state, we will wait indefinitely for the LiveObjects state to become SYNCED again. This may never happen if we are stuck in a disconnection loop or experiencing network disturbances and are never able to receive the full sync sequence. As a result, a mutation operation (map.set, map.remove, counter.increment) will hang indefinitely until the connection fully restores and we are able to complete the sync.

This was not the case prior to this PR. RealtimeObject.publish relied only on ConnectionManager.send, which handled message queuing and had clear cut-off points for invalid states (see

send(msg: ProtocolMessage, queueEvent?: boolean, callback?: PublishCallback): void {
callback = callback || noop;
const state = this.state;
if (state.sendEvents) {
Logger.logAction(this.logger, Logger.LOG_MICRO, 'ConnectionManager.send()', 'sending event');
this.sendImpl(new PendingMessage(msg, callback));
return;
}
const shouldQueue = queueEvent && state.queueEvents;
if (!shouldQueue) {
const err = 'rejecting event, queueEvent was ' + queueEvent + ', state was ' + state.state;
Logger.logAction(this.logger, Logger.LOG_MICRO, 'ConnectionManager.send()', err);
callback(this.errorReason || new ErrorInfo(err, 90000, 400));
return;
}
if (this.logger.shouldLog(Logger.LOG_MICRO)) {
Logger.logAction(
this.logger,
Logger.LOG_MICRO,
'ConnectionManager.send()',
'queueing msg; ' +
stringifyProtocolMessage(
msg,
this.realtime._RealtimePresence,
this.realtime._Annotations,
this.realtime._liveObjectsPlugin,
),
);
}
this.queue(msg, callback);
}
and
/* implement the change and notify */
this.enactStateChange(change);
if (this.state.sendEvents) {
this.sendQueuedMessages();
} else if (!this.state.queueEvents) {
this.realtime.channels.propogateConnectionInterruption(state, change.reason);
this.failQueuedMessages(change.reason as ErrorInfo); // RTN7c
}
). The newly introduced await for the SYNCED event no longer provides the same guarantees.

Context: conceptually, this is not an entirely new pattern. The same behavior exists for presence.get(), which waits for the presence sync event (

async waitSync(): Promise<void> {
const syncInProgress = this.syncInProgress;
Logger.logAction(
this.logger,
Logger.LOG_MINOR,
'PresenceMap.waitSync()',
'channel = ' + this.presence.channel.name + '; syncInProgress = ' + syncInProgress,
);
if (!syncInProgress) {
return;
}
await this.once('sync');
}
), and for object.get(), which waits for the SYNCED state (
await this._eventEmitterInternal.once(ObjectsEvent.synced); // RTO1c
).

However, this problem is exacerbated by the nature of mutation operations compared to presence.get() / object.get(). You would expect .get() calls to happen once at the beginning of the app lifecycle to retrieve the initial state, after which it is automatically synced over time. If there are connection issues at startup, it is much more intuitive for the end user to refresh the page and try again when something doesn't load.

Mutation operations, on the other hand, are smaller, more intentional, and usually user-initiated. Imagine a button that increments a counter and displays a loader while waiting for publishAndApply to complete. If it hangs due to the issue described above, the user will see an indefinite loader on a button, which degrades the user experience drastically - it would force the user to refresh the entire page.

As such, the infinite-await problem is much more pronounced for mutation operations, and we should think about the general approach we want to take in these cases.

We may proposed a general rule: a publicly available async API should never hang indefinitely and must have a clear cut-off point. For presence.get(), object.get(), and publishAndApply, this could be achieved by using Promise.race with a terminal connection or channel state that we consider "bad enough" to justify throwing an error - for example, the same states that cause a queued message to fail in ConnectionManager.


Actions:

decide whether this issue is critical enough to fix for mutation operations in this PR, or open a separate issue to address it for mutation operations and presence.get() / object.get() together. Might be worth opening a DR for this.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added new spec point RTO20e1 (ably/specification@57f4449) to handle this specifically for apply-on-ACK (implementation to come shortly). I'll make a note for us to think about what we should do for presence.get() and object.get().

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

implementation in 7de8748 — will squash pre-merge

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will keep this conversation unresolved for visibility.

// RTO20c
const siteCode = this._channel.connectionManager.connectionDetails?.siteCode;
if (!siteCode) {
throw new this._client.ErrorInfo(
Copy link
Copy Markdown
Contributor

@VeskeR VeskeR Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed this on a call, posting here a summary and required actions:

publishAndApply may throw after a successful publish - that is, after the server has accepted the operation - but before applying the change locally. In this case, the caller receives an error suggesting the operation failed. However, since the server already accepted the operation, the echoed message will eventually arrive over the channel and the operation will be applied locally through the normal flow (the same behavior we had before apply-on-ACK, where all changes were applied upon receiving the echo). This creates an ambiguous situation: we signal failure, but the operation did succeed server-side, and the local state will eventually reflect it.

There are two places where this can happen:

  1. siteCode is not present in connectionDetails - this would be a server error; the server is expected to always provide siteCode.
  2. The PublishResult contains a null serial for one of the published operations - most likely a server error as well (we should confirm with the Realtime team that there are no legitimate situations where a serial is not returned).

In practice, both cases are very unlikely since they require the server to behave incorrectly by omitting required fields.

The question is how we expect developers to handle these errors. There are two scenarios:

  • End-user-initiated action (e.g. updating a counter by clicking a button): the developer catches the error, surfaces it to the user, and the user decides whether to retry. They may also observe the echoed operation arriving and see that the action actually succeeded, choosing not to retry.
  • Automated action (e.g. incrementing a counter on each page visit): if the developer retries on any error, it could result in a double count when the error was thrown after a successful publish. The correct solution here is to support message metadata for LiveObjects operations, enabling the caller to pass a message id for idempotent publishing (see https://ably.com/docs/pub-sub/advanced#idempotency). This was outlined in the https://ably.atlassian.net/wiki/spaces/LOB/pages/4235722804/LODR-042+LiveObjects+Realtime+Client+API+Improvements#Message-metadata but has not been implemented yet. The current publishAndApply implementation escalates the need for message ID support in LiveObjects operations.

Summary:

no changes required in this PR. The scenarios where we throw after a successful publish correspond to genuine server-side errors where something has gone seriously wrong. If customers raise the need for reliable retries of LiveObjects operations in the future, we can address it by implementing message IDs for idempotent publishing.

Actions:

We should confirm with the Realtime team that PublishResult always includes serials for published operations.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the gist here is the following:

  • it has always been possible for the LiveObjects mutation methods to fail even though the operation has been accepted by Realtime (e.g. by becoming SUSPENDED when we haven't yet received the ACK); this does not change that
  • in such a situation, it would be good to have an idempotency mechanism that allows the user to retry without accidental repeated operations; this work is already being considered and is unrelated to this PR
  • if the server does not send the data that the client needs in order to apply-on-ACK then the client (obviously) cannot apply-on-ACK; we need to convince ourselves this can not happen, and we can continue discussing this in [AIT-280] Apply LiveObjects operations on ACK specification#419 (comment)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • The current publishAndApply implementation escalates the need for message ID support in LiveObjects operations.

I don't really agree with this. For operations that are non-idempotent, there are other much more likely scenarios that will result in the client re-attempting an operation that had in fact succeeded (eg the client fails to see the ack). We already need to support ids.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the point being made: I think the publish promise should resolve if the publish succeeded, and not otherwise. There shouldn't be any valid circumstances that, following receipt of the ack, the apply fails. If something happens, out of spec, that causes the apply to fail, then we should be treating this as a protocol error that invalidates the transport, leading to reconnection and resync.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any valid circumstances that, following receipt of the ack, the apply fails.

That's interesting — what do you think about this discussion in that case?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conclusion in that discussion is correct, that all operations will eventually complete, even if that's ultimately triggered by a channel or connection state change. The fact that presence.get() doesn't do that right now is a spec bug I think.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the conclusion in that discussion is correct, that all operations will eventually complete

What do you mean by "eventually"? The point that Andrii was making was that, if the LiveObjects mutation methods don't apply the operation — and thus don't complete — until the objects sync state becomes SYNCED (and if there are no other circumstances in which they complete) then they may never complete.

Thus, in response, I today specified a new behaviour in which if, whilst waiting for the state to become SYNCED, the channel becomes DETACHED , SUSPENDED, or FAILED, then the mutation methods' promises will reject, and the operation will not be applied locally. But this new behaviour contradicts your principle of "There shouldn't be any valid circumstances that, following receipt of the ack, the apply fails."

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I meant that we should ensure that all operations eventually complete, following the same approach that you proposed.

But this new behaviour contradicts your principle of "There shouldn't be any valid circumstances that, following receipt of the ack, the apply fails."

Yes, sorry I should have been clearer. I'm saying that if you get an ack, and attempt to apply the operation locally as a result, then that should always succeed. In the sync case you're not attempting to apply it. There's also the conflation case (see ably/specification#419 (comment)) where you might not attempt to apply the operation on ack.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented the changes from ably/specification@24baaa2, which handle:

  • server misbehaviour (no siteCode, malformed ACK)
  • serial === null

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We updated the implementation to log instead of throw according to the spec.
Will keep this conversation unresolved for visibility.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (3)
src/plugins/liveobjects/realtimeobject.ts (1)

268-337: Consider settling buffered ACK waiters on terminal channel states.
If the channel transitions to failed/detached before sync completes, buffered publishAndApply promises can remain pending indefinitely. It may be worth clearing/rejecting _bufferedAcks on terminal states.

test/realtime/liveobjects.test.js (2)

237-297: Return/await ACK release processing for deterministic tests.
Right now release() is fire‑and‑forget; returning the underlying call (and making releaseAll async) enables callers to await ACK processing when needed.

♻️ Suggested change
       heldAcks.push({
         message,
         release: () => {
           helper.recordPrivateApi('call.transport.onProtocolMessage');
-          originalOnProtocolMessage.call(transport, message);
+          return originalOnProtocolMessage.call(transport, message);
         },
       });
@@
-      releaseAll: () => {
+      releaseAll: async () => {
         while (heldAcks.length > 0) {
-          heldAcks.shift().release();
+          await heldAcks.shift().release();
         }
       },

8429-8433: Ensure interceptor cleanup in apply-on-ACK scenarios.
If a scenario assertion throws, the transport hook stays overridden. Restoring in a finally avoids cross-test leakage.

♻️ Suggested change
-              // hold echoes so we can verify value comes from ACK, not echo
-              createEchoInterceptor(helper, client);
-
-              await scenario.action(root);
+              const interceptor = createEchoInterceptor(helper, client);
+              try {
+                await scenario.action(root);
+              } finally {
+                interceptor.restore();
+              }

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/plugins/liveobjects/realtimeobject.ts (1)

285-293: siteCode validation happens after publish — consider checking earlier.

If siteCode is unavailable, publish has already succeeded (the server received the operation), but the local apply fails with an exception. The caller sees an error despite the operation being applied server-side. Consider validating siteCode before calling publish to fail fast and avoid this inconsistency.

♻️ Suggested reorder
  async publishAndApply(objectMessages: ObjectMessage[]): Promise<void> {
+   // RTO20c - validate siteCode before publishing
+   const siteCode = this._channel.connectionManager.connectionDetails?.siteCode;
+   if (!siteCode) {
+     throw new this._client.ErrorInfo(
+       'cannot apply operation locally: siteCode not available from connectionDetails',
+       40000,
+       400,
+     );
+   }
+
    // RTO20b
    const publishResult = await this.publish(objectMessages);

    // ... logging ...

-   // RTO20c
-   const siteCode = this._channel.connectionManager.connectionDetails?.siteCode;
-   if (!siteCode) {
-     throw new this._client.ErrorInfo(
-       'cannot apply operation locally: siteCode not available from connectionDetails',
-       40000,
-       400,
-     );
-   }
-
    // RTO20d
    const syntheticMessages = objectMessages.map((originalMsg, i) => {

@lawrence-forooghian
Copy link
Copy Markdown
Collaborator Author

this is now up to date with spec commit 57f4449; the only outstanding conversation there is ably/specification#419 (comment) which I'll discuss with Paddy shortly

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
test/realtime/liveobjects.test.js (1)

8429-8433: Restore interceptors and drain held messages to prevent test bleed-through.

createEchoInterceptor (and the ACK variant elsewhere in this suite) overrides transport.onProtocolMessage. If a scenario throws or late messages arrive, the override and queued messages can leak into cleanup or later assertions. Consider always restoring and releasing in a finally, and optionally asserting an echo was actually intercepted so the test can’t pass if the interceptor stops working.

♻️ Suggested cleanup pattern
-              // hold echoes so we can verify value comes from ACK, not echo
-              createEchoInterceptor(helper, client);
-
-              await scenario.action(root);
+              // hold echoes so we can verify value comes from ACK, not echo
+              const interceptor = createEchoInterceptor(helper, client);
+              try {
+                await scenario.action(root);
+                await interceptor.waitForEcho();
+              } finally {
+                await interceptor.releaseAll();
+                interceptor.restore();
+              }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@test/realtime/liveobjects.test.js` around lines 8429 - 8433, The test uses
createEchoInterceptor (and similar ACK interceptor) which overrides
transport.onProtocolMessage and holds messages; wrap the interceptor usage in a
try/finally so you always restore transport.onProtocolMessage and drain/release
any queued messages in the finally block to avoid leaking into cleanup or other
tests; update the test around scenario.action(root) to create the interceptor,
run the action in try, then in finally call the interceptor's restore/release
methods (or explicitly reset transport.onProtocolMessage and flush the held
queue) and add an assertion that an echo/ACK was actually intercepted so the
test fails if the interceptor never captured a message.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugins/liveobjects/realtimeobject.ts`:
- Around line 293-301: The siteCode check occurs after calling publish, causing
irreversible server-side effects; move the validation of
this._channel.connectionManager.connectionDetails?.siteCode (the siteCode
existence check that currently throws new this._client.ErrorInfo) to run before
calling publish (the publish invocation in this class) so the method rejects
early without sending the operation to the server, or alternatively explicitly
document that publish may succeed server-side while the local call rejects;
update the logic around publish and the siteCode check to ensure publish is only
invoked when siteCode is present.
- Around line 304-310: The guard in the syntheticMessages mapping currently only
checks for serial === null which misses undefined when publishResult.serials is
shorter than objectMessages; update the check in the syntheticMessages callback
(where serial is read from publishResult.serials[i]) to use a loose null check
(serial == null) so it catches both null and undefined before throwing the
this._client.ErrorInfo('cannot apply operation locally: serial is null in
PublishResult', 40000, 400).

---

Nitpick comments:
In `@test/realtime/liveobjects.test.js`:
- Around line 8429-8433: The test uses createEchoInterceptor (and similar ACK
interceptor) which overrides transport.onProtocolMessage and holds messages;
wrap the interceptor usage in a try/finally so you always restore
transport.onProtocolMessage and drain/release any queued messages in the finally
block to avoid leaking into cleanup or other tests; update the test around
scenario.action(root) to create the interceptor, run the action in try, then in
finally call the interceptor's restore/release methods (or explicitly reset
transport.onProtocolMessage and flush the held queue) and add an assertion that
an echo/ACK was actually intercepted so the test fails if the interceptor never
captured a message.

Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/plugins/liveobjects/realtimeobject.ts`:
- Around line 346-373: The TOCTOU is that publishAndApply() checks this._state
before registering the pending reject handle, so if the channel enters
suspended/failed/detached between the ACK and push into this._pendingOperations
the promise can hang; fix by after adding the rejectHandle and registering the
ObjectsEvent.synced listener (the block that uses
this._eventEmitterInternal.once and pushes to this._pendingOperations),
immediately re-check this._state (or call the same validation logic used by
_failPendingOperations) and if the state is already non-synced-terminal
(suspended/failed/detached) invoke rejectHandle.reject(...) (or resolve if
synced) so the promise does not hang; update publishAndApply(), the onSynced
handler registration, and ensure cleanup logic still runs through cleanup().

---

Duplicate comments:
In `@src/plugins/liveobjects/realtimeobject.ts`:
- Around line 318-330: This comment is a duplicate and confirms the strict check
`serial === null` in RealtimeObject.publishAndApply() is correct because
publishResult.serials length matches objectMessages; no code change
required—leave the null check as-is and resolve/close the duplicate review
comment for the publishAndApply() serial handling.
- Around line 293-314: The checks in RealtimeObject.publishAndApply (the
siteCode validation and publishResult.serials length check) are intentionally
changed to log+return rather than throw; add a concise inline code comment
immediately above these checks (referencing siteCode, publishResult.serials,
this._channel, and publishAndApply) stating this is deliberate because the
publish has succeeded server-side and missing local application will be handled
by the server echo path, so we must not throw; also include a short note
pointing to the relevant commit or decision for future reviewers.

Comment thread src/plugins/liveobjects/realtimeobject.ts
Comment thread src/plugins/liveobjects/realtimeobject.ts
`buffering ${syntheticMessages.length} message(s) until sync completes; channel=${this._channel.name}`,
);
await new Promise<void>((resolve) => {
this._bufferedAcks.push({ objectMessages: syntheticMessages, signal: resolve }); // RTO20e1
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 will keep this conversation unresolved for visibility.

// RTO20c
const siteCode = this._channel.connectionManager.connectionDetails?.siteCode;
if (!siteCode) {
throw new this._client.ErrorInfo(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We updated the implementation to log instead of throw according to the spec.
Will keep this conversation unresolved for visibility.

Comment thread src/plugins/liveobjects/livecounter.ts Outdated
Comment thread src/plugins/liveobjects/livemap.ts Outdated
Comment thread src/plugins/liveobjects/livemap.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread src/common/lib/util/eventemitter.ts Outdated
Comment thread src/plugins/liveobjects/realtimeobject.ts Outdated
Comment thread test/realtime/liveobjects.test.js
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread test/realtime/liveobjects.test.js Outdated
Comment thread test/realtime/liveobjects.test.js
lawrence-forooghian and others added 2 commits February 20, 2026 10:59
This informs the compiler that the function throws.
Extract this helper from inside the 'Sync events' describe block so
it can be reused by other test sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@VeskeR
Copy link
Copy Markdown
Contributor

VeskeR commented Feb 20, 2026

Weirdly enough, the tests for rejects with error 92008 when channel enters ${state} state all fail in browser tests. I suspect this is something to do with how we fake entering a channel state, and that fails in the playwright?

@lawrence-forooghian
Copy link
Copy Markdown
Collaborator Author

Weirdly enough, the tests for rejects with error 92008 when channel enters ${state} state all fail in browser tests. I suspect this is something to do with how we fake entering a channel state, and that fails in the playwright?

How weird, will take a look

@lawrence-forooghian
Copy link
Copy Markdown
Collaborator Author

Claude reckons this was broken by the setTimeout() -> nextTick() change, looking into it

Based on [1] at 56a0bba. Implementation and tests are Claude-generated
from the spec; I've reviewed them and given plenty of feedback, but
largely resisted the temptation to tweak things that aren't quite how
I'd write them but which are still correct.

The only behaviour here that's not in the spec is to also apply-on-ACK
for batch operations (the batch API isn't in the spec yet).

Summary of decisions re modifications to existing tests (written by
Claude):

- Removed redundant `waitFor*` calls after SDK operations (`map.set()`,
  `counter.increment()`, etc.) - with apply-on-ACK, values are available
  immediately after the operation promise resolves

- Kept `waitFor*` calls after REST operations
  (`objectsHelper.operationRequest()`,
  `objectsHelper.createAndSetOnMap()`) - these still require waiting for
  the echo to arrive over Realtime

- Added explanatory comment to `applyOperationsScenarios` noting that
  those tests cover operations received over Realtime (via REST), and
  pointing to the new "Apply on ACK" section for tests of
  locally-applied operations

[1] ably/specification#419
@lawrence-forooghian
Copy link
Copy Markdown
Collaborator Author

lawrence-forooghian commented Feb 20, 2026

I've reverted that change, with an explanation from Claude — as mentioned there, I don't fully understand the fix (nor do I hugely wish to spend much time trying to do so) but I think it's in "works so good enough" category

@VeskeR
Copy link
Copy Markdown
Contributor

VeskeR commented Feb 20, 2026

I've reverted that change, with an explanation from Claude — as mentioned there, I don't fully understand the fix (nor do I hugely wish to spend much time trying to do so) but I think it's in "works so good enough" category

Yeah, that makes sense if underlying "await-for-ack" procedure in publishAndApply requires a couple of microtask ticks to complete. Let's leave it as a setTimeout

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Development

Successfully merging this pull request may close these issues.

3 participants