-
-
Notifications
You must be signed in to change notification settings - Fork 12
fix(matrix): Only advance sync token when allowed rooms present #133
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,67 @@ | ||
| # Matrix Sync Token Bug Fix Plan | ||
|
|
||
| ## Problem | ||
| The sync token advances even when no messages are extracted from allowed rooms. This causes messages to be missed when: | ||
| 1. Sync returns events only for non-allowed rooms | ||
| 2. Sync returns non-message events (typing, read receipts) for allowed rooms | ||
| 3. Messages arrive during concurrent processing | ||
|
|
||
| ## Root Cause | ||
| ```rust | ||
| // Currently: Always save token after sync, even if no messages extracted | ||
| self.save_sync_token(&next_batch); | ||
| ``` | ||
|
|
||
| ## Solution Options | ||
|
|
||
| ### Option 1: Only advance token when messages extracted (WRONG) | ||
| This would cause infinite re-processing of non-message events. | ||
|
|
||
| ### Option 2: Track "seen" event IDs (Complex) | ||
| Store event IDs of processed messages, skip duplicates. Memory grows over time. | ||
|
|
||
| ### Option 3: Per-room sync tokens (Best) | ||
| Matrix supports per-room `prev_batch` tokens. Track last seen event per room. | ||
|
|
||
| ### Option 4: Check if allowed rooms had ANY events (Simple fix) | ||
| Only advance token if allowed rooms were present in sync response, regardless of whether they had messages. | ||
|
|
||
| ## Recommended Fix: Option 4 | ||
|
|
||
| ```rust | ||
| // Track whether any allowed rooms appeared in this sync | ||
| let mut allowed_rooms_in_sync = false; | ||
|
|
||
| if let Some(rooms) = sync_response.rooms { | ||
| if let Some(joined_rooms) = rooms.join { | ||
| for (room_id, room_data) in joined_rooms { | ||
| let in_allowed = self.allowed_chats.contains(&room_id) || dm_rooms.contains(&room_id); | ||
| if in_allowed { | ||
| allowed_rooms_in_sync = true; | ||
| // ... process messages as before | ||
| } | ||
| } | ||
| } | ||
| } | ||
|
|
||
| // Only advance token if: | ||
| // 1. We extracted messages, OR | ||
| // 2. Allowed rooms were in sync (even with no messages = they're caught up) | ||
| // 3. No allowed rooms configured (process everything) | ||
| if !messages.is_empty() || allowed_rooms_in_sync || (self.allowed_chats.is_empty() && dm_rooms.is_empty()) { | ||
| self.save_sync_token(&next_batch); | ||
| } | ||
| ``` | ||
|
|
||
| ## Why This Works | ||
| - If allowed rooms appear in sync with no messages → token advances (caught up) | ||
| - If sync only has events for non-allowed rooms → token does NOT advance | ||
| - Next sync will include the same batch + any new events | ||
| - Eventually the allowed room will appear and token advances | ||
|
|
||
| ## Edge Case: Stuck Token | ||
| If we never get events for allowed rooms, token never advances. This is fine — we'll get them eventually when someone sends a message. | ||
|
|
||
| ## Implementation | ||
| File: `crates/rustyclaw-core/src/messengers/matrix_cli.rs` | ||
| Lines: ~455-460 (save_sync_token section) |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🔴 Frozen sync token causes repeated re-fetching of non-allowed room events indefinitely
When
has_room_filtersistrueand only non-allowed rooms have activity,should_advance_tokenevaluates tofalse(messagesis empty,allowed_rooms_in_syncis false,has_room_filtersis true). The sync token never advances, so every subsequent call tosync()re-requests the same events from the Matrix server using the stalesinceparameter. Because events already exist since the frozen token, the Matrix server returns immediately instead of long-polling for the configuredtimeout(1000ms atmatrix_cli.rs:651), defeating the long-poll mechanism. Each poll cycle (every ~2 seconds permessenger_handler.rs:320) re-fetches and discards the same non-allowed room data. This continues indefinitely until an allowed room has activity. In deployments where the bot is in many active non-allowed rooms and the allowed rooms are quiet, this causes persistent wasted network bandwidth and server CPU. There is no staleness limit or fallback to force token advancement.Scenario illustrating the issue
Bot configured with allowed_chats = ["!roomA"] and joined to rooms A (quiet) and B (active).
allowed_rooms_in_sync= false,messages= empty → token frozenEach cycle wastefully fetches B's events and discards them.
Prompt for agents
Was this helpful? React with 👍 or 👎 to provide feedback.