Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 23 additions & 5 deletions crates/rustyclaw-core/src/messengers/matrix_cli.rs
Original file line number Diff line number Diff line change
Expand Up @@ -392,9 +392,15 @@ impl MatrixCliMessenger {
}

let mut messages = Vec::new();

// Track whether any allowed rooms appeared in this sync.
// We only advance the sync token if allowed rooms were present,
// to avoid skipping messages when sync returns only non-allowed room events.
let mut allowed_rooms_in_sync = false;

// Get current DM rooms for filtering
let dm_rooms = self.dm_rooms.lock().await.clone();
let has_room_filters = !self.allowed_chats.is_empty() || !dm_rooms.is_empty();
eprintln!("DEBUG: sync - allowed_chats: {:?}, dm_rooms: {:?}", self.allowed_chats, dm_rooms);

if let Some(rooms) = sync_response.rooms {
Expand All @@ -404,13 +410,16 @@ impl MatrixCliMessenger {
// Check if this room is allowed
let in_allowed_chats = self.allowed_chats.contains(&room_id);
let in_dm_rooms = dm_rooms.contains(&room_id);
let is_allowed_room = in_allowed_chats || in_dm_rooms;

// If we have an allowlist OR dm_rooms, only process rooms in one of them
if !self.allowed_chats.is_empty() || !dm_rooms.is_empty() {
if !in_allowed_chats && !in_dm_rooms {
if has_room_filters {
if !is_allowed_room {
eprintln!("DEBUG: skipping room {} (not in allowed lists)", room_id);
continue;
}
// An allowed room appeared in this sync
allowed_rooms_in_sync = true;
}
eprintln!("DEBUG: processing room {}", room_id);

Expand Down Expand Up @@ -450,12 +459,21 @@ impl MatrixCliMessenger {
}
}

// Only update sync token AFTER successful message extraction.
// This prevents losing messages if extraction fails partway through.
{
// Only advance sync token if:
// 1. We extracted messages, OR
// 2. Allowed rooms appeared in sync (even with no messages = caught up), OR
// 3. No room filters configured (process everything)
//
// This prevents the token from advancing when sync only contains
// events for non-allowed rooms, which would cause us to miss messages.
let should_advance_token = !messages.is_empty() || allowed_rooms_in_sync || !has_room_filters;

if should_advance_token {
let mut token = self.sync_token.lock().await;
*token = Some(next_batch.clone());
self.save_sync_token(&next_batch);
} else {
eprintln!("DEBUG: sync - NOT advancing token (no allowed rooms in response)");
}
Comment on lines +469 to 477
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 Frozen sync token causes repeated re-fetching of non-allowed room events indefinitely

When has_room_filters is true and only non-allowed rooms have activity, should_advance_token evaluates to false (messages is empty, allowed_rooms_in_sync is false, has_room_filters is true). The sync token never advances, so every subsequent call to sync() re-requests the same events from the Matrix server using the stale since parameter. Because events already exist since the frozen token, the Matrix server returns immediately instead of long-polling for the configured timeout (1000ms at matrix_cli.rs:651), defeating the long-poll mechanism. Each poll cycle (every ~2 seconds per messenger_handler.rs:320) re-fetches and discards the same non-allowed room data. This continues indefinitely until an allowed room has activity. In deployments where the bot is in many active non-allowed rooms and the allowed rooms are quiet, this causes persistent wasted network bandwidth and server CPU. There is no staleness limit or fallback to force token advancement.

Scenario illustrating the issue

Bot configured with allowed_chats = ["!roomA"] and joined to rooms A (quiet) and B (active).

  1. Sync returns events for room B only
  2. allowed_rooms_in_sync = false, messages = empty → token frozen
  3. Next sync: same token → server returns B's events immediately (no long-poll)
  4. Repeat forever until room A has activity

Each cycle wastefully fetches B's events and discards them.

Prompt for agents
In crates/rustyclaw-core/src/messengers/matrix_cli.rs, the should_advance_token logic at line 469 should be modified to always advance the sync token, while still ensuring messages from allowed rooms are not lost. Two possible approaches:

1. Add a staleness counter or timer: track how many consecutive syncs have had no allowed room events. After a threshold (e.g., 10 syncs or 30 seconds), advance the token anyway to avoid indefinite re-fetching. Reset the counter when allowed rooms appear.

2. Always advance the token but track processed event IDs: advance the token on every sync (as the old code did), but maintain a bounded set of recent event IDs to detect and skip duplicates if needed. This avoids the frozen token problem entirely.

The fix should be applied in the sync() method around lines 462-477. The goal is to prevent the scenario where the bot never advances the sync token because non-allowed rooms have constant activity while allowed rooms are quiet.
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


Ok(messages)
Expand Down
67 changes: 67 additions & 0 deletions docs/matrix-sync-fix.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,67 @@
# Matrix Sync Token Bug Fix Plan

## Problem
The sync token advances even when no messages are extracted from allowed rooms. This causes messages to be missed when:
1. Sync returns events only for non-allowed rooms
2. Sync returns non-message events (typing, read receipts) for allowed rooms
3. Messages arrive during concurrent processing

## Root Cause
```rust
// Currently: Always save token after sync, even if no messages extracted
self.save_sync_token(&next_batch);
```

## Solution Options

### Option 1: Only advance token when messages extracted (WRONG)
This would cause infinite re-processing of non-message events.

### Option 2: Track "seen" event IDs (Complex)
Store event IDs of processed messages, skip duplicates. Memory grows over time.

### Option 3: Per-room sync tokens (Best)
Matrix supports per-room `prev_batch` tokens. Track last seen event per room.

### Option 4: Check if allowed rooms had ANY events (Simple fix)
Only advance token if allowed rooms were present in sync response, regardless of whether they had messages.

## Recommended Fix: Option 4

```rust
// Track whether any allowed rooms appeared in this sync
let mut allowed_rooms_in_sync = false;

if let Some(rooms) = sync_response.rooms {
if let Some(joined_rooms) = rooms.join {
for (room_id, room_data) in joined_rooms {
let in_allowed = self.allowed_chats.contains(&room_id) || dm_rooms.contains(&room_id);
if in_allowed {
allowed_rooms_in_sync = true;
// ... process messages as before
}
}
}
}

// Only advance token if:
// 1. We extracted messages, OR
// 2. Allowed rooms were in sync (even with no messages = they're caught up)
// 3. No allowed rooms configured (process everything)
if !messages.is_empty() || allowed_rooms_in_sync || (self.allowed_chats.is_empty() && dm_rooms.is_empty()) {
self.save_sync_token(&next_batch);
}
```

## Why This Works
- If allowed rooms appear in sync with no messages → token advances (caught up)
- If sync only has events for non-allowed rooms → token does NOT advance
- Next sync will include the same batch + any new events
- Eventually the allowed room will appear and token advances

## Edge Case: Stuck Token
If we never get events for allowed rooms, token never advances. This is fine — we'll get them eventually when someone sends a message.

## Implementation
File: `crates/rustyclaw-core/src/messengers/matrix_cli.rs`
Lines: ~455-460 (save_sync_token section)
Loading