Skip to content

Conversation

@fryorcraken
Copy link
Collaborator

@fryorcraken fryorcraken commented Nov 9, 2025

Problem / Description

Various improvements for reliable channels based on developer feedback as well as better code organisation by extracting some classes.

Solution

  1. new ReliableChannel.syncStatus that emit events and information on missing messages
  2. Add routine to ReliableChannel.stop to ensure there are no hanging timeout, intervals, retries or subscriptions
  3. When a message is marked irretrievably lost, it is delivered so the client can resume participation in the SDS protocols.
  4. Fix some tests due to global state (setInterval, clearInterval) being modified and not restored in some tests.
  5. Extract several logics from ReliableChannel to make it leaner, and test more easily/thoroughly

Notes

For (3) @jm-clius @shash256 the spec is open to interpretation, should we update it?

TODOs:


Checklist

  • Code changes are covered by unit tests.
  • Code changes are covered by e2e tests, if applicable.
  • Dogfooding has been performed, if feasible.
  • A test version has been published, if required.
  • All CI checks pass successfully.

@github-actions
Copy link

github-actions bot commented Nov 9, 2025

size-limit report 📦

Path Size Loading time (3g) Running time (snapdragon) Total time
Waku node 96.31 KB (-0.08% 🔽) 2 s (-0.08% 🔽) 225 ms (-34.21% 🔽) 2.2 s
Waku Simple Light Node 147.49 KB (-0.14% 🔽) 3 s (-0.14% 🔽) 493 ms (+149.15% 🔺) 3.5 s
ECIES encryption 22.62 KB (0%) 453 ms (0%) 100 ms (+58.77% 🔺) 553 ms
Symmetric encryption 22 KB (0%) 440 ms (0%) 186 ms (+102.12% 🔺) 626 ms
DNS discovery 52.17 KB (0%) 1.1 s (0%) 333 ms (+81.25% 🔺) 1.4 s
Peer Exchange discovery 52.91 KB (0%) 1.1 s (0%) 159 ms (-4.28% 🔽) 1.3 s
Peer Cache Discovery 46.64 KB (0%) 933 ms (0%) 264 ms (+49.55% 🔺) 1.2 s
Privacy preserving protocols 77.28 KB (+0.03% 🔺) 1.6 s (+0.03% 🔺) 146 ms (-36.64% 🔽) 1.7 s
Waku Filter 79.72 KB (-0.08% 🔽) 1.6 s (-0.08% 🔽) 474 ms (+106.15% 🔺) 2.1 s
Waku LightPush 77.99 KB (-0.01% 🔽) 1.6 s (-0.01% 🔽) 217 ms (-24.5% 🔽) 1.8 s
History retrieval protocols 83.65 KB (-0.02% 🔽) 1.7 s (-0.02% 🔽) 482 ms (+118.11% 🔺) 2.2 s
Deterministic Message Hashing 28.98 KB (0%) 580 ms (0%) 119 ms (+4.96% 🔺) 698 ms

@fryorcraken fryorcraken marked this pull request as ready for review November 11, 2025 04:33
@fryorcraken fryorcraken requested a review from a team as a code owner November 11, 2025 04:33
@fryorcraken fryorcraken force-pushed the reliable-channels/sync-state branch from 9cc27b1 to 3f40996 Compare November 11, 2025 04:35
@fryorcraken

This comment was marked as outdated.

});

it("should start and stop interval correctly", () => {
// TODO: Skipped because the global state is not being restored and it breaks
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why so? test is passing on master

Copy link
Collaborator Author

@fryorcraken fryorcraken Nov 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My new tests aren't yet on master and they don't pass with this enabled because setInterval does not work any more once with the code in this test.

Tests on master aren't passing either: #2648

I have spent half a day on this and I can guarantee that the code in this test pollutes the global state. It is not being restored (at least with node 22.17.0, I can try with other node versions in a couple of weeks).

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR introduces several improvements to reliable channels including status synchronization, proper cleanup routines, and overflow protection for message history.

  • Adds SyncStatus class to track and emit events for message synchronization state
  • Implements proper cleanup in ReliableChannel.stop() to prevent resource leaks
  • Delivers irretrievably lost messages to allow clients to resume protocol participation
  • Extracts logic into reusable classes (RandomTimeout, RetryManager.stop())
  • Adds overflow protection to MemLocalHistory with configurable max length

Reviewed Changes

Copilot reviewed 18 out of 18 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
packages/sdk/src/reliable_channel/sync_status.ts New class to track and emit sync status events for missing/received/lost messages
packages/sdk/src/reliable_channel/sync_status.spec.ts Test coverage for SyncStatus class functionality
packages/sdk/src/reliable_channel/random_timeout.ts Extracted utility class for managing randomized timeouts with multipliers
packages/sdk/src/reliable_channel/reliable_channel.ts Integrated SyncStatus, refactored event listeners into methods, improved stop() cleanup
packages/sdk/src/reliable_channel/reliable_channel.spec.ts Added cleanup in afterEach hooks, moved test suite to avoid skipping, new test for lost message acknowledgment
packages/sdk/src/reliable_channel/reliable_channel_sync_status.spec.ts Integration tests for sync status functionality
packages/sdk/src/reliable_channel/retry_manager.ts Added stop() method to clear pending timeouts
packages/sdk/src/reliable_channel/index.ts Exports new SyncStatus types and classes
packages/sds/src/message_channel/message_channel.ts Delivers messages marked as irretrievably lost to resume log participation
packages/sds/src/message_channel/mem_local_history.ts Added overflow protection with configurable max length
packages/sds/src/message_channel/mem_local_history.spec.ts Test coverage for overflow protection behavior
packages/utils/src/common/mock_node.ts Improved mock implementations for unsubscribe and stop methods
packages/sdk/src/waku/wait_for_remote_peer.spec.ts Added sinon.restore() in afterEach for proper test cleanup
packages/sdk/src/query_on_connect/query_on_connect.spec.ts Added sinon.restore() in afterEach for proper test cleanup
packages/sdk/src/light_push/retry_manager.spec.ts Skipped test that modifies global state
packages/sdk/src/light_push/light_push.spec.ts Added sinon.restore() in afterEach for proper test cleanup
packages/core/src/lib/stream_manager/stream_manager.spec.ts Added sinon.restore() in afterEach for proper test cleanup
packages/sdk/src/reliable_channel/reliable_channel_sync.spec.ts Reduced delay time in test

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Collaborator

@weboko weboko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, but looks quite complex

@fryorcraken fryorcraken mentioned this pull request Nov 13, 2025
5 tasks
@fryorcraken fryorcraken force-pushed the reliable-channels/sync-state branch from ec6f3d5 to a6323d8 Compare November 13, 2025 05:26
This is to re-enable participation in the SDS protocol. Meaning the
received message with missing dependencies becomes part of the causal
history, re-enabling acknowledgements.
Return a "synced" or "syncing" status on `ReliableChannel.status` that
let the developer know whether messages are missing, and if so, how many.
# Conflicts:
#	packages/sdk/src/reliable_channel/reliable_channel.ts
@fryorcraken fryorcraken force-pushed the reliable-channels/sync-state branch from a6323d8 to 4dde565 Compare November 13, 2025 05:39
@danisharora099
Copy link
Collaborator

wow, the AI reviews are good..

@fryorcraken
Copy link
Collaborator Author

lgtm, but looks quite complex

I tried my best to isolate the logic, will see if I can simplify further.

@fryorcraken fryorcraken force-pushed the reliable-channels/sync-state branch from c6add1a to 025dfa2 Compare November 14, 2025 03:55
@fryorcraken
Copy link
Collaborator Author

@danisharora099 would you prefer that I remove the concept of "synced" vs "syncing"? and just have a "sync-status-update"?

Claude thinks it's too complicated and the consumer should just look at missing field.

I am thinking it's better this way, so that consumer knows what it means, and can in a second step, decide to also check missing and other fields if they want.

@fryorcraken
Copy link
Collaborator Author

Trying to understand why allure fails (no individual test fails).

Copy link
Collaborator

@danisharora099 danisharora099 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

works great in https://opchan.app , LGTM!

@danisharora099
Copy link
Collaborator

@danisharora099 would you prefer that I remove the concept of "synced" vs "syncing"? and just have a "sync-status-update"?

Claude thinks it's too complicated and the consumer should just look at missing field.

I am thinking it's better this way, so that consumer knows what it means, and can in a second step, decide to also check missing and other fields if they want.

I like the "syncing" and "synced"!

@fryorcraken fryorcraken merged commit e5f51d7 into master Nov 15, 2025
11 of 12 checks passed
@fryorcraken fryorcraken deleted the reliable-channels/sync-state branch November 15, 2025 21:57
@weboko weboko mentioned this pull request Nov 15, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants